Skip to content

2023-MindSpore-1/ms-code-179

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contents

TokenFusion is a multimodal token fusion method tailored for transformer-based vision tasks. To effectively fuse multiple modalities, TokenFusion dynamically detects uninformative tokens and substitutes these tokens with projected and aggregated inter-modal features. Residual positional alignment is also adopted to enable explicit utilization of the inter-modal alignments after fusion. The design of TokenFusion allows the transformer to learn correlations among multimodal features, while the single-modal transformer architecture remains largely intact. Extensive experiments are conducted on a variety of homogeneous and heterogeneous modalities and demonstrate that TokenFusion surpasses state-of-the-art methods in three typical vision tasks: multimodal image-to-image translation, RGB-depth semantic segmentation, and 3D object detection with point cloud and images.

Paper: Yikai Wang, Xinghao Chen, Lele Cao, Wenbing Huang, Fuchun Sun, Yunhe Wang. Multimodal Token Fusion for Vision Transformers. In CVPR 2022.

The overall architecture of TokenFusion is show below:

Dataset used: NYUDv2

  • Dataset size:colorful images and depth images, with labels in 40 segmentation classes
    • Train:795 samples
    • Test:654 samples
  • Data format:image files
    • Note:Data will be processed in utils/datasets.py
.TokenFusion
├── README.md               # descriptions about TokenFusion
├── models
│   ├── mix_transformer.py  # definition of backbone model
│   ├── segformer.py        # definition of segmentation model
│   └── modules.py          # TokenFusion operations
├── utils
│   ├── datasets.py         # data loader
│   ├── helpers.py          # utility functions
│   ├── transforms.py       # data preprocessing functions
│   └── meter.py            # utility functions
├── eval.py                 # evaluation interface
├── cfg.py                  # configure file
├── config.py               # configure file

To Be Done

Launch

# infer example

python eval.py --checkpoint_path  [CHECKPOINT_PATH]

Checkpoint can be downloaded at here or Mindspore Hub.

Result

result: IoU=54.8, ckpt= ./tokenfusion_ascend_v180_nyudv2_research_cv_acc54.8.ckpt
Parameters Ascend
Model TokenFusion
Model Version tokenfusion_seg_mitb3_nyudv2
Resource Ascend 910
Uploaded Date 2022-08-10
MindSpore Version 1.8.0
Dataset NYUDv2
Outputs probability
Accuracy 1pc: 54.8%
Speed 1pc:1s/step

We set the seed inside datasets.py.

Please check the official homepage.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages