feats(transformers):add glm4v_moe model by iugoood · Pull Request #1447 · mindspore-lab/mindone

iugoood · 2025-12-04T01:35:25Z

Add:
1、add glm4v_moe model
2、add UT

Notes : not validated with official released models due to large size (>100b)
zai-org/GLM-4.6V
could try with zero3 distrubuted inference for attempts.

gemini-code-assist · 2025-12-04T01:35:48Z

Summary of Changes

Hello @iugoood, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces the GLM-4.1V Mixture-of-Experts (MoE) model to the mindone.transformers library. This significant feature expands the library's multimodal capabilities by providing a MindSpore-compatible implementation of a model designed for complex text and vision tasks. The changes involve adding the core model architecture, integrating it into the library's auto-loading mechanisms, and ensuring its functionality through a new test suite.

Highlights

New Model Addition: The GLM-4.1V Mixture-of-Experts (MoE) model has been added to the mindone.transformers library, enabling support for this advanced multimodal architecture.
MindSpore Adaptation: The implementation of the GLM-4.1V MoE model is adapted from the Hugging Face Transformers library, specifically modified to run efficiently on the MindSpore framework.
Architectural Components: The new model includes specialized components such as Glm4vMoeRMSNorm, Glm4vMoeTextAttention, Glm4vMoeTextTopkRouter, Glm4vMoeTextMoE, Glm4vMoeVisionModel, and multimodal rotary position embeddings, reflecting its complex text and vision processing capabilities.
Auto-Configuration and Auto-Modeling Integration: The GLM-4.1V MoE model and its configuration (Glm4vMoeConfig) have been integrated into the mindone.transformers auto-loading system, allowing for seamless instantiation via AutoConfig and AutoModel.
Comprehensive Testing: A dedicated test suite (test_modeling_glm4v_moe.py) has been added to verify the correctness and numerical consistency of the MindSpore implementation against its PyTorch counterpart.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist · 2025-12-04T01:37:44Z

Summary of Changes

Hello @iugoood, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the mindone/transformers library by integrating the GLM4v-MoE model. This new model is a multimodal transformer that leverages a Mixture-of-Experts architecture, enabling it to process and generate content based on both text and visual inputs. The changes involve adding the complete model architecture, updating auto-configuration mechanisms, and introducing a dedicated test suite to ensure its proper functioning within the MindSpore framework.

Highlights

New Model Integration: Introduced the Glm4vMoe model, a multimodal Mixture-of-Experts (MoE) transformer, into the mindone/transformers library.
Core Model Components: Added comprehensive implementation for Glm4vMoe including Glm4vMoeModel, Glm4vMoeTextModel, Glm4vMoeVisionModel, and Glm4vMoeForConditionalGeneration, along with their sub-components like attention, MLP, and MoE routing.
Auto-Configuration Support: Updated mindone/transformers/models/auto modules to automatically recognize and load Glm4vMoe configurations and models.
Dedicated Test Suite: Included a new test file test_modeling_glm4v_moe.py to validate the functionality and numerical consistency of the Glm4vMoeForConditionalGeneration model.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist · 2025-12-04T01:38:04Z

Summary of Changes

Hello @iugoood, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the mindone/transformers library by adding the glm4v_moe model. This new model is designed for multimodal tasks, combining text and vision processing capabilities. It features a Mixture-of-Experts architecture for efficient text handling and specialized components for processing visual inputs, including advanced rotary position embeddings. The integration ensures seamless use within the existing framework, supported by updated auto-loading mechanisms and a dedicated test suite.

Highlights

New Model Integration: Introduced the "glm4v_moe" model, a multimodal Mixture-of-Experts (MoE) transformer, into the mindone/transformers library.
Core Model Implementation: Added comprehensive code for the "glm4v_moe" model, including its text and vision components, attention mechanisms, and MoE routing logic.
Auto-Configuration Support: Integrated the new model into the auto-configuration and auto-modeling systems, making it easily discoverable and usable.
Dedicated Test Suite: Included a new test file to ensure the correct functionality and compatibility of the Glm4vMoeForConditionalGeneration model.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This PR adds the glm4v_moe model. The changes are extensive, including the model implementation, boilerplate for auto-classes, and a new test file. The model implementation is complex, especially the multimodal parts and the Mixture of Experts (MoE) layers. I've found a few issues related to correctness, performance, and code clarity that should be addressed. Specifically, there's a typo in a class name, a potential dtype mismatch for Flash Attention, an inefficient and likely buggy loop in position index calculation, an inconsistency in token ID usage, and some dead code. The new test file is a good start but could be expanded to cover video inputs to catch some of these issues.

gemini-code-assist · 2025-12-04T01:38:47Z

mindone/transformers/models/glm4v_moe/modeling_glm4v_moe.py

+        cu_seqlens = mindspore.mint.repeat_interleave(grid_thw[:, 1] * grid_thw[:, 2], grid_thw[:, 0]).cumsum(
+            dim=0,
+            # Select dtype based on the following factors:
+            #  - FA2 requires that cu_seqlens_q must have dtype int32
+            # See https://github.com/huggingface/transformers/pull/34852 for more information
+            dtype=grid_thw.dtype,
+        )


The comment indicates that Flash Attention 2 requires cu_seqlens to have a dtype of int32. However, the code sets dtype=grid_thw.dtype, which could be int64 and cause issues. It's safer to explicitly set the dtype to mindspore.int32 to ensure compatibility with Flash Attention 2.

Suggested change

cu_seqlens = mindspore.mint.repeat_interleave(grid_thw[:, 1] * grid_thw[:, 2], grid_thw[:, 0]).cumsum(

dim=0,

# Select dtype based on the following factors:

# - FA2 requires that cu_seqlens_q must have dtype int32

# See https://github.com/huggingface/transformers/pull/34852 for more information

dtype=grid_thw.dtype,

)

cu_seqlens = mindspore.mint.repeat_interleave(grid_thw[:, 1] * grid_thw[:, 2], grid_thw[:, 0]).cumsum(

dim=0,

# Select dtype based on the following factors:

# - FA2 requires that cu_seqlens_q must have dtype int32

# See https://github.com/huggingface/transformers/pull/34852 for more information

dtype=mindspore.int32,

)

gemini-code-assist · 2025-12-04T01:38:47Z

mindone/transformers/models/glm4v_moe/modeling_glm4v_moe.py

+                        for t_idx in range(llm_grid_t):
+                            t_index = (
+                                mindspore.Tensor(t_idx).view(-1, 1).expand((-1, llm_grid_h * llm_grid_w)).flatten()
+                            )
+
+                            h_index = (
+                                mindspore.mint.arange(llm_grid_h).view(1, -1, 1).expand((1, -1, llm_grid_w)).flatten()
+                            )
+                            w_index = (
+                                mindspore.mint.arange(llm_grid_w).view(1, 1, -1).expand((1, llm_grid_h, -1)).flatten()
+                            )
+                            llm_pos_ids_list.append(mindspore.mint.stack([t_index, h_index, w_index]) + st_idx)


The logic for calculating video position indices appears to be inefficient and potentially incorrect. The loop over llm_grid_t (which is video_frame_num) seems to have O(N^2) complexity with respect to the number of video frames, as it recomputes embeddings for previous frames at each step. Additionally, creating a tensor with mindspore.Tensor(t_idx) inside a loop is inefficient.

A more efficient and likely correct implementation would process one frame per video token without a loop, similar to how image position indices are calculated. This would improve performance and avoid redundant computations.

gemini-code-assist · 2025-12-04T01:38:47Z

mindone/transformers/models/glm4v_moe/modeling_glm4v_moe.py

+            special_video_mask = inputs_embeds == self.get_input_embeddings()(
+                mindspore.Tensor(
+                    self.config.video_token_id,
+                    dtype=mindspore.int64,
+                )
+            )
+            special_video_mask = special_video_mask.all(-1)


There's an inconsistency in creating the special_video_mask. When input_ids is None, it uses self.config.video_token_id, but when input_ids is present, it uses self.config.image_token_id. Based on the comment on line 1301 and the logic in get_rope_index, it seems image_token_id is intended for video frames as well. To ensure consistency, image_token_id should be used in both cases.

Suggested change

special_video_mask = inputs_embeds == self.get_input_embeddings()(

mindspore.Tensor(

self.config.video_token_id,

dtype=mindspore.int64,

)

)

special_video_mask = special_video_mask.all(-1)

special_video_mask = inputs_embeds == self.get_input_embeddings()(

mindspore.Tensor(

self.config.image_token_id,

dtype=mindspore.int64,

)

)

special_video_mask = special_video_mask.all(-1)

gemini-code-assist · 2025-12-04T01:38:47Z

mindone/transformers/models/glm4v_moe/modeling_glm4v_moe.py

+            module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
+
+
+class Glm4vMoeisionMlp(mindspore.nn.Cell):


There appears to be a typo in the class name. It should likely be Glm4vMoeVisionMlp instead of Glm4vMoeisionMlp.

Suggested change

class Glm4vMoeisionMlp(mindspore.nn.Cell):

class Glm4vMoeVisionMlp(mindspore.nn.Cell):

gemini-code-assist · 2025-12-04T01:38:48Z

mindone/transformers/models/glm4v_moe/modeling_glm4v_moe.py

+        if expand_size == 1:
+            return input_ids, model_kwargs
+
+        visual_keys = ["pixel_values", "image_grid_thw", "pixel_values_videos", "video_grid_thw", "second_per_grid_ts"]


The key second_per_grid_ts is included in visual_keys but appears to be unused throughout the model's forward pass. This suggests it might be dead code. To improve code clarity and maintainability, it should be removed from this list.

Suggested change

visual_keys = ["pixel_values", "image_grid_thw", "pixel_values_videos", "video_grid_thw", "second_per_grid_ts"]

visual_keys = ["pixel_values", "image_grid_thw", "pixel_values_videos", "video_grid_thw"]

gemini-code-assist · 2025-12-04T01:38:48Z

mindone/transformers/models/glm4v_moe/modeling_glm4v_moe.py

+                elif key == "second_per_grid_ts":
+                    dict_to_expand[key] = _repeat_interleave_samples(
+                        dict_to_expand[key], lengths=list(video_nums), repeat_times=expand_size
+                    )


This block handles the second_per_grid_ts key, which seems to be unused in the model. This code block should be removed to eliminate dead code.

gemini-code-assist

Code Review

This pull request introduces the glm4v_moe model, a significant new feature. The implementation is comprehensive, covering the model architecture, auto-class registration, and initial tests. However, I have identified several critical and high-severity performance issues related to non-vectorized operations, such as loops over batch items or experts, and code that is not compatible with graph compilation. Addressing these points will be crucial for making the model efficient and usable in production environments. I've also noted a minor typo in a class name.

gemini-code-assist · 2025-12-04T01:40:01Z

mindone/transformers/models/glm4v_moe/modeling_glm4v_moe.py

+            )
+            image_index, video_index = 0, 0
+            video_group_index = 0
+            for i, input_ids in enumerate(total_input_ids):


The for loop iterating over total_input_ids processes each item in the batch sequentially. This is highly inefficient and will be a major performance bottleneck, especially for larger batch sizes. The logic within the loop, including tolist() and itertools.groupby, is also not suitable for graph compilation and is slow. This entire function should be vectorized to process the whole batch at once using tensor operations.

gemini-code-assist · 2025-12-04T01:40:01Z

mindone/transformers/models/glm4v_moe/modeling_glm4v_moe.py

+        for expert_idx in range(len(self.experts)):
+            expert = self.experts[expert_idx]
+            mask = expert_mask[expert_idx]
+            token_indices, weight_indices = mindspore.mint.where(mask)
+
+            if token_indices.numel() > 0:
+                expert_weights = topk_weights[token_indices, weight_indices]
+                expert_input = hidden_states[token_indices]
+                expert_output = expert(expert_input)
+                weighted_output = expert_output * expert_weights.unsqueeze(-1)
+                final_hidden_states.index_add_(0, token_indices, weighted_output)


The loop over self.experts is a significant performance bottleneck, especially when the number of experts is large. This implementation will execute the expert forward pass sequentially for each expert, which is inefficient. To optimize this, consider vectorizing the expert computation. You could explore techniques used in other MoE implementations within this library (like Mixtral), which often involve grouping tokens by expert and performing batched computations to avoid Python loops.

gemini-code-assist · 2025-12-04T01:40:01Z

mindone/transformers/models/glm4v_moe/modeling_glm4v_moe.py

+                    samples = mindspore.mint.split(image_grid_thw, list(image_nums))
+                    # compute the sequence length of images for each sample
+                    lengths = [mindspore.mint.prod(sample, dim=1).sum() for sample in samples]


The list comprehension used to calculate lengths involves tensor operations inside a loop, which is inefficient and not friendly to graph compilation. This can be vectorized using ops.segment_sum to improve performance significantly.

Suggested change

samples = mindspore.mint.split(image_grid_thw, list(image_nums))

# compute the sequence length of images for each sample

lengths = [mindspore.mint.prod(sample, dim=1).sum() for sample in samples]

prods = mindspore.mint.prod(image_grid_thw, dim=1)

segment_ids = ops.arange(image_nums.shape[0]).repeat_interleave(image_nums.to(ms.int32))

lengths = ops.segment_sum(prods, segment_ids).asnumpy().tolist()

gemini-code-assist · 2025-12-04T01:40:01Z

mindone/transformers/models/glm4v_moe/modeling_glm4v_moe.py

+            module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
+
+
+class Glm4vMoeisionMlp(mindspore.nn.Cell):


There appears to be a typo in the class name Glm4vMoeisionMlp. It should likely be Glm4vMoeVisionMlp. This typo is also present at line 723 where the class is instantiated.

Suggested change

class Glm4vMoeisionMlp(mindspore.nn.Cell):

class Glm4vMoeVisionMlp(mindspore.nn.Cell):

gemini-code-assist · 2025-12-04T01:40:01Z

mindone/transformers/models/glm4v_moe/modeling_glm4v_moe.py

+        self.norm1 = Glm4vMoeRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
+        self.norm2 = Glm4vMoeRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
+        self.attn = Glm4vMoeVisionAttention(config)
+        self.mlp = Glm4vMoeisionMlp(config, bias=False)


Correcting the typo in the class instantiation from Glm4vMoeisionMlp to Glm4vMoeVisionMlp to match the corrected class name.

Suggested change

self.mlp = Glm4vMoeisionMlp(config, bias=False)

self.mlp = Glm4vMoeVisionMlp(config, bias=False)

gemini-code-assist · 2025-12-04T01:40:01Z

mindone/transformers/models/glm4v_moe/modeling_glm4v_moe.py

+    def rot_pos_emb(self, grid_thw):
+        pos_ids = []
+        for t, h, w in grid_thw:
+            t, h, w = int(t), int(h), int(w)


The explicit casting of tensor elements to Python integers using int() will break graph compilation (mindspore.jit). The test file for this model currently only runs in PyNative mode, which might be why this was not caught. To make the model compatible with graph mode for better performance, you should use tensor operations instead. For instance, mindspore.mint.arange(h) can accept a 0-d tensor h.

Suggested change

t, h, w = int(t), int(h), int(w)

t, h, w = t.item(), h.item(), w.item()

gemini-code-assist

Code Review

This pull request introduces the glm4v_moe model, a multimodal model with a Mixture-of-Experts architecture. The implementation is adapted from the HuggingFace Transformers library for MindSpore. The changes include the model definition, configuration, and integration into the auto-model classes, along with a new test suite.

My review has identified a few issues:

A critical bug in _expand_inputs_for_generation that would lead to a runtime error.
A performance issue in get_rope_index due to redundant computations inside a loop.
A typo in a class name (Glm4vMoeisionMlp).
A developer-facing comment in the moe method that should be removed from production code.

I've provided suggestions to fix these issues. After addressing them, the PR should be in good shape.

gemini-code-assist · 2025-12-04T01:42:14Z

mindone/transformers/models/glm4v_moe/modeling_glm4v_moe.py

+                    dict_to_expand[key] = _repeat_interleave_samples(
+                        dict_to_expand[key], lengths=list(video_nums), repeat_times=expand_size
+                    )


The code for key == 'second_per_grid_ts' attempts to call _repeat_interleave_samples on a list, but this function expects a tensor. This will cause a runtime error. The list should be converted to a tensor before being passed to _repeat_interleave_samples and then converted back to a list.

Suggested change

dict_to_expand[key] = _repeat_interleave_samples(

dict_to_expand[key], lengths=list(video_nums), repeat_times=expand_size

)

tensor = mindspore.Tensor(dict_to_expand[key])

tensor = _repeat_interleave_samples(

tensor, lengths=list(video_nums), repeat_times=expand_size

)

dict_to_expand[key] = tensor.tolist()

gemini-code-assist · 2025-12-04T01:42:14Z

mindone/transformers/models/glm4v_moe/modeling_glm4v_moe.py

+        r"""
+        CALL FOR CONTRIBUTION! I don't have time to optimise this right now, but expert weights need to be fused
+        to not have to do a loop here (deepseek has 256 experts soooo yeah).
+        """


This docstring contains a "CALL FOR CONTRIBUTION" which seems like a developer note. It should be removed from the production codebase for clarity and professionalism. The comment also highlights a valid performance concern about the loop over experts, which could be addressed separately.

gemini-code-assist · 2025-12-04T01:42:14Z

mindone/transformers/models/glm4v_moe/modeling_glm4v_moe.py

+            module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
+
+
+class Glm4vMoeisionMlp(mindspore.nn.Cell):


There's a typo in the class name. It should be Glm4vMoeVisionMlp instead of Glm4vMoeisionMlp. This typo also appears at line 723 where this class is instantiated.

Suggested change

class Glm4vMoeisionMlp(mindspore.nn.Cell):

class Glm4vMoeVisionMlp(mindspore.nn.Cell):

gemini-code-assist · 2025-12-04T01:42:14Z

mindone/transformers/models/glm4v_moe/modeling_glm4v_moe.py

+        self.norm1 = Glm4vMoeRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
+        self.norm2 = Glm4vMoeRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
+        self.attn = Glm4vMoeVisionAttention(config)
+        self.mlp = Glm4vMoeisionMlp(config, bias=False)


This class name has a typo. It should be Glm4vMoeVisionMlp.

Suggested change

self.mlp = Glm4vMoeisionMlp(config, bias=False)

self.mlp = Glm4vMoeVisionMlp(config, bias=False)

gemini-code-assist · 2025-12-04T01:42:14Z

mindone/transformers/models/glm4v_moe/modeling_glm4v_moe.py

+                            h_index = (
+                                mindspore.mint.arange(llm_grid_h).view(1, -1, 1).expand((1, -1, llm_grid_w)).flatten()
+                            )
+                            w_index = (
+                                mindspore.mint.arange(llm_grid_w).view(1, 1, -1).expand((1, llm_grid_h, -1)).flatten()
+                            )


The computation of h_index and w_index is inside the for t_idx in range(llm_grid_t): loop, but they don't depend on t_idx. This leads to redundant computations in each iteration. Moving them outside the loop will improve performance.

iugoood requested a review from vigo999 as a code owner December 4, 2025 01:35

iugoood changed the title ~~featsadd glm4v_moe model~~ feats(transformers):add glm4v_moe model Dec 4, 2025

gemini-code-assist bot reviewed Dec 4, 2025

View reviewed changes

iugoood force-pushed the glm4vmoe branch 2 times, most recently from 7b5b8f0 to 096fbf1 Compare December 10, 2025 06:37

add glm4v_moe model

427e528

iugoood force-pushed the glm4vmoe branch from 096fbf1 to 427e528 Compare December 10, 2025 06:58

zhanghuiyao approved these changes Dec 18, 2025

View reviewed changes

		module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)


		class Glm4vMoeisionMlp(mindspore.nn.Cell):

	class Glm4vMoeisionMlp(mindspore.nn.Cell):
	class Glm4vMoeVisionMlp(mindspore.nn.Cell):

	visual_keys = ["pixel_values", "image_grid_thw", "pixel_values_videos", "video_grid_thw", "second_per_grid_ts"]
	visual_keys = ["pixel_values", "image_grid_thw", "pixel_values_videos", "video_grid_thw"]

	self.mlp = Glm4vMoeisionMlp(config, bias=False)
	self.mlp = Glm4vMoeVisionMlp(config, bias=False)

	t, h, w = int(t), int(h), int(w)
	t, h, w = t.item(), h.item(), w.item()

-                    dict_to_expand[key] = _repeat_interleave_samples(
-                        dict_to_expand[key], lengths=list(video_nums), repeat_times=expand_size
-                    )
+                    tensor = mindspore.Tensor(dict_to_expand[key])
+                    tensor = _repeat_interleave_samples(
+                        tensor, lengths=list(video_nums), repeat_times=expand_size
+                    )
+                    dict_to_expand[key] = tensor.tolist()

Conversation

iugoood commented Dec 4, 2025 • edited by Fzilan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Dec 4, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot commented Dec 4, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot commented Dec 4, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

iugoood commented Dec 4, 2025 •

edited by Fzilan

Loading