feat(transformers/models): add models of Superglue and Superpoint by JIJIARONGjijiarong · Pull Request #1348 · mindspore-lab/mindone

JIJIARONGjijiarong · 2025-10-09T07:05:35Z

Add

mindone.transformers.SuperGluePreTrainedModel
mindone.transformers.SuperGlueForKeypointMatching
Usage

from mindone.transformers import SuperGlueForKeypointMatching, AutoImageProcessor
from mindspore import Tensor
from PIL import Image
import requests

url = "https://github.com/magicleap/SuperGluePretrainedNetwork/blob/master/assets/phototourism_sample_images/london_bridge_78916675_4568141288.jpg?raw=true"
image1 = Image.open(requests.get(url, stream=True).raw)
url = "https://github.com/magicleap/SuperGluePretrainedNetwork/blob/master/assets/phototourism_sample_images/london_bridge_19481797_2295892421.jpg?raw=true"
image2 = Image.open(requests.get(url, stream=True).raw)
images = [image1, image2]

processor = AutoImageProcessor.from_pretrained("magic-leap-community/superglue_outdoor")
model = SuperGlueForKeypointMatching.from_pretrained("magic-leap-community/superglue_outdoor")

inputs = processor(images, return_tensors="np")
outputs = model(**{k: Tensor(v) for k, v in inputs.items()})

Performance Experiments are tested on Ascend Atlas 800T A2 machines with mindspore 2.6.0 pynative mode,

pipeline speed

model	weights loading(s)	precision	fps
SuperGlueForKeypointMatching	14.16	bf16	0.24
SuperGlueForKeypointMatching	14.25	fp16	0.23
SuperGlueForKeypointMatching	14.72	fp32	0.23

Add

mindone.transformers.SuperPointForKeypointDetection
mindone.transformers.SuperPointPreTrainedModel
Usage

from mindone.transformers import AutoImageProcessor, SuperPointForKeypointDetection
from mindspore import Tensor
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

processor = AutoImageProcessor.from_pretrained("magic-leap-community/superpoint")
model = SuperPointForKeypointDetection.from_pretrained("magic-leap-community/superpoint")

inputs = processor(image, return_tensors="np")
outputs = model(**{k: Tensor(v) for k, v in inputs.items()})

Performance Experiments are tested on Ascend Atlas 800T A2 machines with mindspore 2.6.0 pynative mode,

pipeline speed

model	weights loading(s)	precision	fps
SegGptImageProcessor	0.13	bf16	4.79
SegGptImageProcessor	0.13	fp16	0.79
SegGptImageProcessor	0.12	fp32	0.79

# Conflicts: # mindone/transformers/models/auto/configuration_auto.py

gemini-code-assist · 2025-10-09T07:05:55Z

Summary of Changes

Hello @JIJIARONGjijiarong, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the mindone.transformers library by incorporating two advanced computer vision models: SuperGlue and SuperPoint. SuperPoint provides robust keypoint detection and descriptor extraction, while SuperGlue leverages these keypoints for highly accurate feature matching between images. This addition broadens the library's utility for tasks requiring precise visual correspondence, such as augmented reality, 3D reconstruction, and robotics.

Highlights

SuperGlue Model Integration: Added the SuperGlueForKeypointMatching model, which performs feature matching using Graph Neural Networks and Optimal Transport.
SuperPoint Model Integration: Introduced the SuperPointForKeypointDetection model for self-supervised keypoint detection and description.
Auto-Configuration and Auto-Modeling Support: Updated the mindone.transformers.models.auto module to automatically recognize and load configurations and models for SuperGlue and SuperPoint.
Comprehensive Unit Tests: Included new test files for both SuperGlue and SuperPoint models to ensure functional correctness and compatibility between PyTorch and MindSpore implementations.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces the SuperGlue and SuperPoint models, ported from the HuggingFace Transformers library. The implementation contains several critical bugs related to tensor shape manipulation and operations that could lead to runtime errors or incorrect model behavior. I have identified these issues and provided suggestions for fixes. Additionally, there are some minor bugs in the newly added test files that I've also pointed out.

gemini-code-assist · 2025-10-09T07:15:51Z

mindone/transformers/models/superglue/modeling_superglue.py

+        if keypoints.shape[2] == 0:  # no keypoints
+            shape = keypoints.shape[:-1]
+            return (
+                mint.full([shape, -1], dtype=mindspore.int32),


The shape argument for mint.full is incorrect. It is currently [shape, -1], which creates a list containing a tuple and an integer. This is not a valid shape for creating a tensor. The brackets [] should be removed to pass the shape tuple directly.

Suggested change

mint.full([shape, -1], dtype=mindspore.int32),

mint.full(shape, -1, dtype=mindspore.int32),

gemini-code-assist · 2025-10-09T07:15:51Z

mindone/transformers/models/superglue/modeling_superglue.py

+        else:
+            extended_attention_mask = mint.ones(
+                (batch_size, num_keypoints),
+            )


The extended_attention_mask is not created correctly when the input mask is None.

The batch dimension of the created tensor is incorrect. It should be descriptors.shape[0] (which is batch_size * 2) instead of batch_size.

The created 2D mask is not passed through self.get_extended_attention_mask to convert it into the 4D format expected by the attention layers, which will cause a shape mismatch error during the forward pass.

Suggested change

else:

extended_attention_mask = mint.ones(

(batch_size, num_keypoints),

)

else:

mask = mint.ones((descriptors.shape[0], num_keypoints))

input_shape = descriptors.shape

extended_attention_mask = self.get_extended_attention_mask(mask, input_shape)

gemini-code-assist · 2025-10-09T07:15:51Z

mindone/transformers/models/superpoint/modeling_superpoint.py

+            x_nchw = mint.unsqueeze(x, 0)
+            pooled_nchw = mint.functional.max_pool2d(x_nchw, kernel_size=kernel_size, stride=stride, padding=padding)
+            output = mint.squeeze(pooled_nchw, 0)


The max_pool implementation for 3D tensors is incorrect. Reshaping a (B, H, W) tensor to (1, B, H, W) with mint.unsqueeze(x, 0) causes max_pool2d to treat the batch dimension as the channel dimension, performing pooling across the batch. Pooling should be independent for each sample. The input to max_pool2d should be (B, 1, H, W), which can be achieved by using mint.unsqueeze(x, 1) and then squeezing dimension 1.

Suggested change

x_nchw = mint.unsqueeze(x, 0)

pooled_nchw = mint.functional.max_pool2d(x_nchw, kernel_size=kernel_size, stride=stride, padding=padding)

output = mint.squeeze(pooled_nchw, 0)

x_nchw = mint.unsqueeze(x, 1)

pooled_nchw = mint.functional.max_pool2d(x_nchw, kernel_size=kernel_size, stride=stride, padding=padding)

output = mint.squeeze(pooled_nchw, 1)

gemini-code-assist · 2025-10-09T07:15:51Z

mindone/transformers/models/superpoint/modeling_superpoint.py

+        keypoints = keypoints / mindspore.tensor(
+            [width, height],
+        )


There is a potential dtype mismatch in this division operation. keypoints is a float tensor, while mindspore.tensor([width, height]) creates an integer tensor. This can lead to type errors or unexpected behavior. To ensure correctness and robustness, the divisor should be explicitly cast to the same dtype as keypoints.

Suggested change

keypoints = keypoints / mindspore.tensor(

[width, height],

)

keypoints = keypoints / mindspore.tensor(

[width, height], dtype=keypoints.dtype

)

gemini-code-assist · 2025-10-09T07:15:52Z

tests/transformers_tests/models/superglue/test_modeling_superglue.py

+        (pixel_values,),
+        {},
+        {
+            "keypoints": 2,


The index for keypoints in outputs_map is incorrect. In the KeypointMatchingOutput dataclass, the fields are ordered as loss, matches, matching_scores, keypoints, ... . Therefore, keypoints is at index 3, not 2.

Suggested change

"keypoints": 2,

"keypoints": 3,

gemini-code-assist · 2025-10-09T07:15:52Z

tests/transformers_tests/models/superpoint/test_modeling_superpoint.py

+        (pixel_values,),
+        {},
+        {
+            "keypoints": 0,


The index for keypoints in outputs_map is incorrect. In the SuperPointKeypointDescriptionOutput dataclass, the fields are ordered as loss, keypoints, ... . Therefore, keypoints is at index 1, not 0.

Suggested change

"keypoints": 0,

"keypoints": 1,

SamitHuang

why is fps of fp16 the same as that of fp32? is it typo?

jiarongji and others added 5 commits October 9, 2025 14:32

add superglue superpoint

35139ee

# Conflicts: # mindone/transformers/models/auto/configuration_auto.py

add superglue and superpoint

bda0024

add Super bugfix

0d47c84

super bugfix

1cdf70c

code clean

e6c9c0b

JIJIARONGjijiarong requested a review from vigo999 as a code owner October 9, 2025 07:05

vigo999 requested review from SamitHuang and zhanghuiyao October 9, 2025 07:09

vigo999 assigned JIJIARONGjijiarong Oct 9, 2025

vigo999 added the new model add new model to mindone label Oct 9, 2025

vigo999 added this to mindone Oct 9, 2025

vigo999 moved this to In Progress in mindone Oct 9, 2025

vigo999 mentioned this pull request Oct 9, 2025

hf transformers 4.50 model problem tracking #1327

Open

gemini-code-assist bot reviewed Oct 9, 2025

View reviewed changes

SamitHuang reviewed Oct 9, 2025

View reviewed changes

vigo999 approved these changes Oct 18, 2025

View reviewed changes

vigo999 added this to the v0.4.x milestone Oct 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(transformers/models): add models of Superglue and Superpoint#1348

feat(transformers/models): add models of Superglue and Superpoint#1348
JIJIARONGjijiarong wants to merge 5 commits intomindspore-lab:masterfrom
JIJIARONGjijiarong:Super

JIJIARONGjijiarong commented Oct 9, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Oct 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 9, 2025

Uh oh!

gemini-code-assist bot Oct 9, 2025

Uh oh!

gemini-code-assist bot Oct 9, 2025

Uh oh!

gemini-code-assist bot Oct 9, 2025

Uh oh!

gemini-code-assist bot Oct 9, 2025

Uh oh!

gemini-code-assist bot Oct 9, 2025

Uh oh!

SamitHuang left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	mint.full([shape, -1], dtype=mindspore.int32),
	mint.full(shape, -1, dtype=mindspore.int32),

-        else:
-            extended_attention_mask = mint.ones(
-                (batch_size, num_keypoints),
-            )
+        else:
+            mask = mint.ones((descriptors.shape[0], num_keypoints))
+            input_shape = descriptors.shape
+            extended_attention_mask = self.get_extended_attention_mask(mask, input_shape)

Conversation

JIJIARONGjijiarong commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Oct 9, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

SamitHuang left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JIJIARONGjijiarong commented Oct 9, 2025 •

edited

Loading