Attention vs Add in LKA

In table 3, changing attention (mul) to add reduces VAN performance from 75.4 to 74.6. I think this is really huge. However, in the ablation study, you stated that "Besides, replacing attention with adding operation is also **not** achieving a lower accuracy". Is it okay to say it like that since the performance drop is 0.8

Can't treat add as a type of attention function? In [Attention Mechanisms in Computer Vision: A Survey](https://arxiv.org/abs/2111.07624), we have the formula: 
![image](https://user-images.githubusercontent.com/69593462/204690140-597879fd-5e8f-42ab-b513-9e367281ebfe.png)
I can treat function f here is an addition operation can't I? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attention vs Add in LKA #32

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Attention vs Add in LKA #32

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions