Skip to content

feats(transformers):add glm4v_moe model#1447

Open
iugoood wants to merge 1 commit intomindspore-lab:masterfrom
iugoood:glm4vmoe
Open

feats(transformers):add glm4v_moe model#1447
iugoood wants to merge 1 commit intomindspore-lab:masterfrom
iugoood:glm4vmoe

Conversation

@iugoood
Copy link
Copy Markdown
Contributor

@iugoood iugoood commented Dec 4, 2025

Add:
1、add glm4v_moe model
2、add UT

Notes : not validated with official released models due to large size (>100b)
zai-org/GLM-4.6V
could try with zero3 distrubuted inference for attempts.

@iugoood iugoood requested a review from vigo999 as a code owner December 4, 2025 01:35
@iugoood iugoood changed the title featsadd glm4v_moe model feats(transformers):add glm4v_moe model Dec 4, 2025
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @iugoood, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces the GLM-4.1V Mixture-of-Experts (MoE) model to the mindone.transformers library. This significant feature expands the library's multimodal capabilities by providing a MindSpore-compatible implementation of a model designed for complex text and vision tasks. The changes involve adding the core model architecture, integrating it into the library's auto-loading mechanisms, and ensuring its functionality through a new test suite.

Highlights

  • New Model Addition: The GLM-4.1V Mixture-of-Experts (MoE) model has been added to the mindone.transformers library, enabling support for this advanced multimodal architecture.
  • MindSpore Adaptation: The implementation of the GLM-4.1V MoE model is adapted from the Hugging Face Transformers library, specifically modified to run efficiently on the MindSpore framework.
  • Architectural Components: The new model includes specialized components such as Glm4vMoeRMSNorm, Glm4vMoeTextAttention, Glm4vMoeTextTopkRouter, Glm4vMoeTextMoE, Glm4vMoeVisionModel, and multimodal rotary position embeddings, reflecting its complex text and vision processing capabilities.
  • Auto-Configuration and Auto-Modeling Integration: The GLM-4.1V MoE model and its configuration (Glm4vMoeConfig) have been integrated into the mindone.transformers auto-loading system, allowing for seamless instantiation via AutoConfig and AutoModel.
  • Comprehensive Testing: A dedicated test suite (test_modeling_glm4v_moe.py) has been added to verify the correctness and numerical consistency of the MindSpore implementation against its PyTorch counterpart.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @iugoood, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the mindone/transformers library by integrating the GLM4v-MoE model. This new model is a multimodal transformer that leverages a Mixture-of-Experts architecture, enabling it to process and generate content based on both text and visual inputs. The changes involve adding the complete model architecture, updating auto-configuration mechanisms, and introducing a dedicated test suite to ensure its proper functioning within the MindSpore framework.

Highlights

  • New Model Integration: Introduced the Glm4vMoe model, a multimodal Mixture-of-Experts (MoE) transformer, into the mindone/transformers library.
  • Core Model Components: Added comprehensive implementation for Glm4vMoe including Glm4vMoeModel, Glm4vMoeTextModel, Glm4vMoeVisionModel, and Glm4vMoeForConditionalGeneration, along with their sub-components like attention, MLP, and MoE routing.
  • Auto-Configuration Support: Updated mindone/transformers/models/auto modules to automatically recognize and load Glm4vMoe configurations and models.
  • Dedicated Test Suite: Included a new test file test_modeling_glm4v_moe.py to validate the functionality and numerical consistency of the Glm4vMoeForConditionalGeneration model.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @iugoood, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the mindone/transformers library by adding the glm4v_moe model. This new model is designed for multimodal tasks, combining text and vision processing capabilities. It features a Mixture-of-Experts architecture for efficient text handling and specialized components for processing visual inputs, including advanced rotary position embeddings. The integration ensures seamless use within the existing framework, supported by updated auto-loading mechanisms and a dedicated test suite.

Highlights

  • New Model Integration: Introduced the "glm4v_moe" model, a multimodal Mixture-of-Experts (MoE) transformer, into the mindone/transformers library.
  • Core Model Implementation: Added comprehensive code for the "glm4v_moe" model, including its text and vision components, attention mechanisms, and MoE routing logic.
  • Auto-Configuration Support: Integrated the new model into the auto-configuration and auto-modeling systems, making it easily discoverable and usable.
  • Dedicated Test Suite: Included a new test file to ensure the correct functionality and compatibility of the Glm4vMoeForConditionalGeneration model.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR adds the glm4v_moe model. The changes are extensive, including the model implementation, boilerplate for auto-classes, and a new test file. The model implementation is complex, especially the multimodal parts and the Mixture of Experts (MoE) layers. I've found a few issues related to correctness, performance, and code clarity that should be addressed. Specifically, there's a typo in a class name, a potential dtype mismatch for Flash Attention, an inefficient and likely buggy loop in position index calculation, an inconsistency in token ID usage, and some dead code. The new test file is a good start but could be expanded to cover video inputs to catch some of these issues.

Comment on lines +878 to +884
cu_seqlens = mindspore.mint.repeat_interleave(grid_thw[:, 1] * grid_thw[:, 2], grid_thw[:, 0]).cumsum(
dim=0,
# Select dtype based on the following factors:
# - FA2 requires that cu_seqlens_q must have dtype int32
# See https://github.com/huggingface/transformers/pull/34852 for more information
dtype=grid_thw.dtype,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The comment indicates that Flash Attention 2 requires cu_seqlens to have a dtype of int32. However, the code sets dtype=grid_thw.dtype, which could be int64 and cause issues. It's safer to explicitly set the dtype to mindspore.int32 to ensure compatibility with Flash Attention 2.

Suggested change
cu_seqlens = mindspore.mint.repeat_interleave(grid_thw[:, 1] * grid_thw[:, 2], grid_thw[:, 0]).cumsum(
dim=0,
# Select dtype based on the following factors:
# - FA2 requires that cu_seqlens_q must have dtype int32
# See https://github.com/huggingface/transformers/pull/34852 for more information
dtype=grid_thw.dtype,
)
cu_seqlens = mindspore.mint.repeat_interleave(grid_thw[:, 1] * grid_thw[:, 2], grid_thw[:, 0]).cumsum(
dim=0,
# Select dtype based on the following factors:
# - FA2 requires that cu_seqlens_q must have dtype int32
# See https://github.com/huggingface/transformers/pull/34852 for more information
dtype=mindspore.int32,
)

Comment on lines +1178 to +1189
for t_idx in range(llm_grid_t):
t_index = (
mindspore.Tensor(t_idx).view(-1, 1).expand((-1, llm_grid_h * llm_grid_w)).flatten()
)

h_index = (
mindspore.mint.arange(llm_grid_h).view(1, -1, 1).expand((1, -1, llm_grid_w)).flatten()
)
w_index = (
mindspore.mint.arange(llm_grid_w).view(1, 1, -1).expand((1, llm_grid_h, -1)).flatten()
)
llm_pos_ids_list.append(mindspore.mint.stack([t_index, h_index, w_index]) + st_idx)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic for calculating video position indices appears to be inefficient and potentially incorrect. The loop over llm_grid_t (which is video_frame_num) seems to have O(N^2) complexity with respect to the number of video frames, as it recomputes embeddings for previous frames at each step. Additionally, creating a tensor with mindspore.Tensor(t_idx) inside a loop is inefficient.

A more efficient and likely correct implementation would process one frame per video token without a loop, similar to how image position indices are calculated. This would improve performance and avoid redundant computations.

Comment on lines +1293 to +1299
special_video_mask = inputs_embeds == self.get_input_embeddings()(
mindspore.Tensor(
self.config.video_token_id,
dtype=mindspore.int64,
)
)
special_video_mask = special_video_mask.all(-1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There's an inconsistency in creating the special_video_mask. When input_ids is None, it uses self.config.video_token_id, but when input_ids is present, it uses self.config.image_token_id. Based on the comment on line 1301 and the logic in get_rope_index, it seems image_token_id is intended for video frames as well. To ensure consistency, image_token_id should be used in both cases.

Suggested change
special_video_mask = inputs_embeds == self.get_input_embeddings()(
mindspore.Tensor(
self.config.video_token_id,
dtype=mindspore.int64,
)
)
special_video_mask = special_video_mask.all(-1)
special_video_mask = inputs_embeds == self.get_input_embeddings()(
mindspore.Tensor(
self.config.image_token_id,
dtype=mindspore.int64,
)
)
special_video_mask = special_video_mask.all(-1)

module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)


class Glm4vMoeisionMlp(mindspore.nn.Cell):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There appears to be a typo in the class name. It should likely be Glm4vMoeVisionMlp instead of Glm4vMoeisionMlp.

Suggested change
class Glm4vMoeisionMlp(mindspore.nn.Cell):
class Glm4vMoeVisionMlp(mindspore.nn.Cell):

if expand_size == 1:
return input_ids, model_kwargs

visual_keys = ["pixel_values", "image_grid_thw", "pixel_values_videos", "video_grid_thw", "second_per_grid_ts"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The key second_per_grid_ts is included in visual_keys but appears to be unused throughout the model's forward pass. This suggests it might be dead code. To improve code clarity and maintainability, it should be removed from this list.

Suggested change
visual_keys = ["pixel_values", "image_grid_thw", "pixel_values_videos", "video_grid_thw", "second_per_grid_ts"]
visual_keys = ["pixel_values", "image_grid_thw", "pixel_values_videos", "video_grid_thw"]

Comment on lines +1748 to +1751
elif key == "second_per_grid_ts":
dict_to_expand[key] = _repeat_interleave_samples(
dict_to_expand[key], lengths=list(video_nums), repeat_times=expand_size
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block handles the second_per_grid_ts key, which seems to be unused in the model. This code block should be removed to eliminate dead code.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the glm4v_moe model, a significant new feature. The implementation is comprehensive, covering the model architecture, auto-class registration, and initial tests. However, I have identified several critical and high-severity performance issues related to non-vectorized operations, such as loops over batch items or experts, and code that is not compatible with graph compilation. Addressing these points will be crucial for making the model efficient and usable in production environments. I've also noted a minor typo in a class name.

)
image_index, video_index = 0, 0
video_group_index = 0
for i, input_ids in enumerate(total_input_ids):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The for loop iterating over total_input_ids processes each item in the batch sequentially. This is highly inefficient and will be a major performance bottleneck, especially for larger batch sizes. The logic within the loop, including tolist() and itertools.groupby, is also not suitable for graph compilation and is slow. This entire function should be vectorized to process the whole batch at once using tensor operations.

Comment on lines +326 to +336
for expert_idx in range(len(self.experts)):
expert = self.experts[expert_idx]
mask = expert_mask[expert_idx]
token_indices, weight_indices = mindspore.mint.where(mask)

if token_indices.numel() > 0:
expert_weights = topk_weights[token_indices, weight_indices]
expert_input = hidden_states[token_indices]
expert_output = expert(expert_input)
weighted_output = expert_output * expert_weights.unsqueeze(-1)
final_hidden_states.index_add_(0, token_indices, weighted_output)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The loop over self.experts is a significant performance bottleneck, especially when the number of experts is large. This implementation will execute the expert forward pass sequentially for each expert, which is inefficient. To optimize this, consider vectorizing the expert computation. You could explore techniques used in other MoE implementations within this library (like Mixtral), which often involve grouping tokens by expert and performing batched computations to avoid Python loops.

Comment on lines +1725 to +1727
samples = mindspore.mint.split(image_grid_thw, list(image_nums))
# compute the sequence length of images for each sample
lengths = [mindspore.mint.prod(sample, dim=1).sum() for sample in samples]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The list comprehension used to calculate lengths involves tensor operations inside a loop, which is inefficient and not friendly to graph compilation. This can be vectorized using ops.segment_sum to improve performance significantly.

Suggested change
samples = mindspore.mint.split(image_grid_thw, list(image_nums))
# compute the sequence length of images for each sample
lengths = [mindspore.mint.prod(sample, dim=1).sum() for sample in samples]
prods = mindspore.mint.prod(image_grid_thw, dim=1)
segment_ids = ops.arange(image_nums.shape[0]).repeat_interleave(image_nums.to(ms.int32))
lengths = ops.segment_sum(prods, segment_ids).asnumpy().tolist()

module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)


class Glm4vMoeisionMlp(mindspore.nn.Cell):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There appears to be a typo in the class name Glm4vMoeisionMlp. It should likely be Glm4vMoeVisionMlp. This typo is also present at line 723 where the class is instantiated.

Suggested change
class Glm4vMoeisionMlp(mindspore.nn.Cell):
class Glm4vMoeVisionMlp(mindspore.nn.Cell):

self.norm1 = Glm4vMoeRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
self.norm2 = Glm4vMoeRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
self.attn = Glm4vMoeVisionAttention(config)
self.mlp = Glm4vMoeisionMlp(config, bias=False)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Correcting the typo in the class instantiation from Glm4vMoeisionMlp to Glm4vMoeVisionMlp to match the corrected class name.

Suggested change
self.mlp = Glm4vMoeisionMlp(config, bias=False)
self.mlp = Glm4vMoeVisionMlp(config, bias=False)

def rot_pos_emb(self, grid_thw):
pos_ids = []
for t, h, w in grid_thw:
t, h, w = int(t), int(h), int(w)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The explicit casting of tensor elements to Python integers using int() will break graph compilation (mindspore.jit). The test file for this model currently only runs in PyNative mode, which might be why this was not caught. To make the model compatible with graph mode for better performance, you should use tensor operations instead. For instance, mindspore.mint.arange(h) can accept a 0-d tensor h.

Suggested change
t, h, w = int(t), int(h), int(w)
t, h, w = t.item(), h.item(), w.item()

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the glm4v_moe model, a multimodal model with a Mixture-of-Experts architecture. The implementation is adapted from the HuggingFace Transformers library for MindSpore. The changes include the model definition, configuration, and integration into the auto-model classes, along with a new test suite.

My review has identified a few issues:

  • A critical bug in _expand_inputs_for_generation that would lead to a runtime error.
  • A performance issue in get_rope_index due to redundant computations inside a loop.
  • A typo in a class name (Glm4vMoeisionMlp).
  • A developer-facing comment in the moe method that should be removed from production code.

I've provided suggestions to fix these issues. After addressing them, the PR should be in good shape.

Comment on lines +1749 to +1751
dict_to_expand[key] = _repeat_interleave_samples(
dict_to_expand[key], lengths=list(video_nums), repeat_times=expand_size
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The code for key == 'second_per_grid_ts' attempts to call _repeat_interleave_samples on a list, but this function expects a tensor. This will cause a runtime error. The list should be converted to a tensor before being passed to _repeat_interleave_samples and then converted back to a list.

Suggested change
dict_to_expand[key] = _repeat_interleave_samples(
dict_to_expand[key], lengths=list(video_nums), repeat_times=expand_size
)
tensor = mindspore.Tensor(dict_to_expand[key])
tensor = _repeat_interleave_samples(
tensor, lengths=list(video_nums), repeat_times=expand_size
)
dict_to_expand[key] = tensor.tolist()

Comment on lines +318 to +321
r"""
CALL FOR CONTRIBUTION! I don't have time to optimise this right now, but expert weights need to be fused
to not have to do a loop here (deepseek has 256 experts soooo yeah).
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This docstring contains a "CALL FOR CONTRIBUTION" which seems like a developer note. It should be removed from the production codebase for clarity and professionalism. The comment also highlights a valid performance concern about the loop over experts, which could be addressed separately.

module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)


class Glm4vMoeisionMlp(mindspore.nn.Cell):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a typo in the class name. It should be Glm4vMoeVisionMlp instead of Glm4vMoeisionMlp. This typo also appears at line 723 where this class is instantiated.

Suggested change
class Glm4vMoeisionMlp(mindspore.nn.Cell):
class Glm4vMoeVisionMlp(mindspore.nn.Cell):

self.norm1 = Glm4vMoeRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
self.norm2 = Glm4vMoeRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
self.attn = Glm4vMoeVisionAttention(config)
self.mlp = Glm4vMoeisionMlp(config, bias=False)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This class name has a typo. It should be Glm4vMoeVisionMlp.

Suggested change
self.mlp = Glm4vMoeisionMlp(config, bias=False)
self.mlp = Glm4vMoeVisionMlp(config, bias=False)

Comment on lines +1183 to +1188
h_index = (
mindspore.mint.arange(llm_grid_h).view(1, -1, 1).expand((1, -1, llm_grid_w)).flatten()
)
w_index = (
mindspore.mint.arange(llm_grid_w).view(1, 1, -1).expand((1, llm_grid_h, -1)).flatten()
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The computation of h_index and w_index is inside the for t_idx in range(llm_grid_t): loop, but they don't depend on t_idx. This leads to redundant computations in each iteration. Moving them outside the loop will improve performance.

@iugoood iugoood force-pushed the glm4vmoe branch 2 times, most recently from 7b5b8f0 to 096fbf1 Compare December 10, 2025 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants