# Attention-Mechanism-Pytorch
This repository contains an implementation of many attention mechanism models.
# Change Log
- [x] Published Initial Attention Models, 2024-8-12.
# 目录
- [Attention Series](#attention-series)
- [1. External Attention Usage](#1-external-attention-usage)
- [2. Self Attention Usage](#2-self-attention-usage)
- [3. Simplified Self Attention Usage](#3-simplified-self-attention-usage)
- [4. Squeeze-and-Excitation Attention Usage](#4-squeeze-and-excitation-attention-usage)
- [5. SK Attention Usage](#5-sk-attention-usage)
- [6. CBAM Attention Usage](#6-cbam-attention-usage)
- [7. BAM Attention Usage](#7-bam-attention-usage)
- [8. ECA Attention Usage](#8-eca-attention-usage)
- [9. DANet Attention Usage](#9-danet-attention-usage)
- [10. Pyramid Split Attention (PSA) Usage](#10-Pyramid-Split-Attention-Usage)
- [11. Efficient Multi-Head Self-Attention(EMSA) Usage](#11-Efficient-Multi-Head-Self-Attention-Usage)
- [12. Shuffle Attention Usage](#12-Shuffle-Attention-Usage)
- [13. MUSE Attention Usage](#13-MUSE-Attention-Usage)
- [14. SGE Attention Usage](#14-SGE-Attention-Usage)
- [15. A2 Attention Usage](#15-A2-Attention-Usage)
- [16. AFT Attention Usage](#16-AFT-Attention-Usage)
- [17. Outlook Attention Usage](#17-Outlook-Attention-Usage)
- [18. ViP Attention Usage](#18-ViP-Attention-Usage)
- [19. CoAtNet Attention Usage](#19-CoAtNet-Attention-Usage)
- [20. HaloNet Attention Usage](#20-HaloNet-Attention-Usage)
- [21. Polarized Self-Attention Usage](#21-Polarized-Self-Attention-Usage)
- [22. CoTAttention Usage](#22-CoTAttention-Usage)
- [23. Residual Attention Usage](#23-Residual-Attention-Usage)
- [24. S2 Attention Usage](#24-S2-Attention-Usage)
- [25. GFNet Attention Usage](#25-GFNet-Attention-Usage)
- [26. Triplet Attention Usage](#26-TripletAttention-Usage)
- [27. Coordinate Attention Usage](#27-Coordinate-Attention-Usage)
- [28. MobileViT Attention Usage](#28-MobileViT-Attention-Usage)
- [29. ParNet Attention Usage](#29-ParNet-Attention-Usage)
- [30. UFO Attention Usage](#30-UFO-Attention-Usage)
- [31. ACmix Attention Usage](#31-Acmix-Attention-Usage)
- [32. MobileViTv2 Attention Usage](#32-MobileViTv2-Attention-Usage)
- [33. DAT Attention Usage](#33-DAT-Attention-Usage)
- [34. CrossFormer Attention Usage](#34-CrossFormer-Attention-Usage)
- [35. MOATransformer Attention Usage](#35-MOATransformer-Attention-Usage)
- [36. CrissCrossAttention Attention Usage](#36-CrissCrossAttention-Attention-Usage)
- [37. Axial_attention Attention Usage](#37-Axial_attention-Attention-Usage)
- [38. Frequency Channel Attention Usage](#38-Frequency-Channel-Attention-Usage)
- [39. Attention Augmented Convolutional Networks Usage](#39-Attention-Augmented-Convolutional-Networks-Usage)
- [40. Global Context Attention Usage](#40-Global-Context-Attention-Usage)
- [41. Linear Context Transform Attention Usage](#41-Linear-Context-Transform-Attention-Usage)
- [42. Gated Channel Transformation Usage](#42-Gated-Channel-Transformation-Usage)
- [43. Gaussian Context Attention Usage](#43-Gaussian-Context-Attention-Usage)
- [MLP Series](#mlp-series)
- [1. RepMLP Usage](#1-RepMLP-Usage)
- [2. MLP-Mixer Usage](#2-MLP-Mixer-Usage)
- [3. ResMLP Usage](#3-ResMLP-Usage)
- [4. gMLP Usage](#4-gMLP-Usage)
- [5. sMLP Usage](#5-sMLP-Usage)
- [6. vip-mlp Usage](#6-vip-mlp-Usage)
***
### 1. External Attention Usage
#### 1.1. Paper
["Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks"](https://arxiv.org/abs/2105.02358)
#### 1.2. Overview
![](.//AttentionMechanism/model/img/External_Attention.png)
#### 1.3. Usage Code
```python
from AttentionMechanism.model.attention.ExternalAttention import ExternalAttention
import torch
input=torch.randn(50,49,512)
ea = ExternalAttention(d_model=512,S=8)
output=ea(input)
print(output.shape)
```
***
### 2. Self Attention Usage
#### 2.1. Paper
["Attention Is All You Need"](https://arxiv.org/pdf/1706.03762.pdf)
#### 1.2. Overview
![](.//AttentionMechanism/model/img/SA.png)
#### 1.3. Usage Code
```python
from AttentionMechanism.model.attention.SelfAttention import ScaledDotProductAttention
import torch
input=torch.randn(50,49,512)
sa = ScaledDotProductAttention(d_model=512, d_k=512, d_v=512, h=8)
output=sa(input,input,input)
print(output.shape)
```
***
### 3. Simplified Self Attention Usage
#### 3.1. Paper
[SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks (ICML 2021)](https://proceedings.mlr.press/v139/yang21o/yang21o.pdf)
#### 3.2. Overview
![](.//AttentionMechanism/model/img/SimAttention.png)
#### 3.3. Usage Code
```python
from AttentionMechanism.model.attention.SimplifiedSelfAttention import SimplifiedScaledDotProductAttention
import torch
input=torch.randn(50,49,512)
ssa = SimplifiedScaledDotProductAttention(d_model=512, h=8)
output=ssa(input,input,input)
print(output.shape)
```
***
### 4. Squeeze-and-Excitation Attention Usage
#### 4.1. Paper
["Squeeze-and-Excitation Networks"](https://arxiv.org/abs/1709.01507)
#### 4.2. Overview
![](.//AttentionMechanism/model/img/SE.png)
#### 4.3. Usage Code
```python
from AttentionMechanism.model.attention.SEAttention import SEAttention
import torch
input=torch.randn(50,512,7,7)
se = SEAttention(channel=512,reduction=8)
output=se(input)
print(output.shape)
```
***
### 5. SK Attention Usage
#### 5.1. Paper
["Selective Kernel Networks"](https://arxiv.org/pdf/1903.06586.pdf)
#### 5.2. Overview
![](.//AttentionMechanism/model/img/SK.png)
#### 5.3. Usage Code
```python
from AttentionMechanism.model.attention.SKAttention import SKAttention
import torch
input=torch.randn(50,512,7,7)
se = SKAttention(channel=512,reduction=8)
output=se(input)
print(output.shape)
```
***
### 6. CBAM Attention Usage
#### 6.1. Paper
["CBAM: Convolutional Block Attention Module"](https://openaccess.thecvf.com/content_ECCV_2018/papers/Sanghyun_Woo_Convolutional_Block_Attention_ECCV_2018_paper.pdf)
#### 6.2. Overview
![](.//AttentionMechanism/model/img/CBAM1.png)
![](.//AttentionMechanism/model/img/CBAM2.png)
#### 6.3. Usage Code
```python
from AttentionMechanism.model.attention.CBAM import CBAMBlock
import torch
input=torch.randn(50,512,7,7)
kernel_size=input.shape[2]
cbam = CBAMBlock(channel=512,reduction=16,kernel_size=kernel_size)
output=cbam(input)
print(output.shape)
```
***
### 7. BAM Attention Usage
#### 7.1. Paper
["BAM: Bottleneck Attention Module"](https://arxiv.org/pdf/1807.06514.pdf)
#### 7.2. Overview
![](.//AttentionMechanism/model/img/BAM.png)
#### 7.3. Usage Code
```python
from AttentionMechanism.model.attention.BAM import BAMBlock
import torch
input=torch.randn(50,512,7,7)
bam = BAMBlock(channel=512,reduction=16,dia_val=2)
output=bam(input)
print(output.shape)
```
***
### 8. ECA Attention Usage
#### 8.1. Paper
["ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks"](https://arxiv.org/pdf/1910.03151.pdf)
#### 8.2. Overview
![](.//AttentionMechanism/model/img/ECA.png)
#### 8.3. Usage Code
```python
from AttentionMechanism.model.attention.ECAAttention import ECAAttention
import torch
input=torch.randn(50,512,7,7)
eca = ECAAttention(kernel_size=3)
output=eca(input)
print(output.shape)
```
***
### 9. DANet Attention Usage
#### 9.1. Paper
["Dual Attention Network for Scene Segmentation"](https://arxiv.org/pdf/1809.02983.pdf)
#### 9.2. Overview
![](.//AttentionMechanism/model/img/danet.png)
#### 9.3. Usage Code
```python
from AttentionMechanism.model.attention.DANet import DAModule
import torch
input=torch.randn(50,512,7,7)
danet=DAModule(d_model=512,kernel_size=3,H=7,W=7)
print(danet(input).shape)
```
***
### 10. Pyramid Split Attention Usage
#### 10.1. Paper
["EPSANet: An Efficient Pyramid Split Attention Block on Convolutional Neural Network"](https://arxiv.org/pdf/2105.14447.pdf)
#### 10.2. Overview
![](.//AttentionMechanism/model/img/psa.png)
#### 10.3. Usage Code
```python
from AttentionMechanism.model.attention.PSA import PSA
import torch
input=torch.randn(50,512,7,7)
psa = PSA(channel=512,reduction=8)
output=psa(input)
print(output.shape)
```
***
### 11. Efficient Multi-Head Self-Attention Usage
#### 11.1. Paper
["ResT: An Efficient Transformer for Visual Recognition"](https://arxiv.org/abs/2105.13677)
#### 11.2. Overview
![](.//AttentionMechanism/model/img/EMSA.png)
#### 11.3. Usage Code
```python
from AttentionMechanism.model.attention.EMSA import EMSA
import torch
from torch import nn
from torch.nn import functional as F
input=torch.randn(50,64,512)
emsa = EMSA(d_model=512, d_k=512, d_v=512, h=8,H=8,W=8,ratio=2,apply_transform=True)
output=emsa(input,input,input)
print(output.shape)
```
***
### 12. Shuffle Attention Usage
#### 12.1. Paper
["SA-NET: SHUFFLE ATTENTION FOR DEEP CONVOLUTIONAL NEURAL NETWORKS"](https://arxiv.org/pdf/2102.00240.pdf)
#### 12.2. Overview
![](.//AttentionMechanism/model/img/ShuffleAttention.jpg)
#### 12.3. Usage Code
```python
from AttentionMechanism.model.attention.ShuffleAttention import ShuffleAttention
import torch
from torch import nn
from torch.nn import functional as F
input=torch.randn(50,512,7,7)
se = ShuffleAttention(channel=512,G=8)
output=se(input)
print(output.shape)
```
***
### 13. MUSE Attention Usage
#### 13.1. Paper
["MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning"](https://arxiv.org/abs/1911.09483)
#### 13.2. Overview
![](.//AttentionMechanism/model/img/MUSE.png)
#### 13.3. Usage Code
```python
from AttentionMechanism.model.attention.MUSEAttention import MUSEAttention
import torch
from torch import nn
from torch.nn import functional as F
input=torch.randn(50,49,512)
sa = MUSEAttention(d_model=512, d_k=512, d_v=512, h=8)
output=sa(input,input,input)
print(output.shape)
```
***
### 14. SGE Attention Usage
#### 14.1. Paper
[Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks](https://arxiv.org/pdf/1905.09646.pdf)
#### 14.2. Overview
![](.//AttentionMechanism/model/img/SGE.png)
#### 14.3. Usage Code
```python
from AttentionMechanism.model.attention.SGE import SpatialGroupEnhance
import torch
from torch import nn
from torch.nn import functional as F
input=torch.randn(50,512,7,7)
sge = SpatialGroupEnhance(groups=8)
output=sge(input)
print(output.shape)
```
***
### 15. A2 Attention Usage
#### 15.1. Paper
[A2-Nets: Double Attention Networks](https://arxiv.org/pdf/1810.11579.pdf)
#### 15.2. Overview
![](.//AttentionMechanism/model/img/A2.png)
#### 15.3. Usage Code
```python
from AttentionMechanism.model.attention.A2Atttention import DoubleAttention
import torch
from torch import nn
from torch.nn import functional as F
input=torch.randn(50,512,7,7)
a2 = DoubleAttention(512,128,128,True)
output=a2(input)
print(output.shape)
```
### 16. AFT Attention Usage
#### 16.1. Paper
[An Attention Free Transformer](https://arxiv.org/pdf/2105.14103v1.pdf)
#### 16.2. Overview
![](.//AttentionMechanism/model/img/AFT.jpg)
#### 16.3. Usage Code
```python
from AttentionMechanism.model.attention.AFT import AFT_FULL
import torch
from torch import nn
from torch.nn import functional as F
input=torch.randn(50,49,512)
aft_full = AFT_FULL(d_model=512, n=49)
output=aft_full(input)
print(output.shape)
```
***
### 17. Outlook Attention Usage
#### 17.1. Paper
[VOLO: Vision Outlooker for Visual Recognition"](https://arxiv.org/abs/2106.13112)
#### 17.2. Overview
![](.//AttentionMechanism/model/img/OutlookAttention.png)
#### 17.3. Usage Code
```python
from AttentionMechanism.model.attention.OutlookAttention import OutlookAttention
import torch
from torch import nn
from torch.nn import functional as F
input=torch.randn(50,28,28,512)
outlook = OutlookAttention(dim=512)
output=outlook(input)
print(output.shape)
```
***
### 18. ViP Attention Usage
#### 18.1. Paper
[Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition"](https://arxiv.org/abs/2106.12368)
#### 18.2. Overview
![](.//AttentionMechanism/model/img/ViP.png)
#### 18.3. Usage Code
```python
from AttentionMechanism.model.attention.ViP import WeightedPermuteMLP
import torch
from torch import nn
from torch.nn import functional as F
input=torch.randn(64,8,8,512)
seg_dim=8
vip=WeightedPermuteMLP(512,seg_dim)
out=vip(input)
print(out.shape)
```
***
### 19. CoAtNet Attention Usage
#### 19.1. Paper
[CoAtNet: Marrying Convolution and Attention for All Data Sizes"](https://arxiv.org/abs/2106.04803)
#### 19.2. Overview
None
#### 19.3. Usage Code
```python
from AttentionMechanism.model.attention.CoAtNet import CoAtNet
import torch
from torch import nn
from torch.nn import functional as F
input=torch.randn(1,3,224,224)
mbconv=CoAtNet(in_ch=3,image_size=224)
out=mbconv(input)
print(out.shape)
```
***
### 20. HaloNet Attention Usage
#### 20.1. Paper
[Scaling Local Self-Attention for Parameter Efficient Visual Backbones"](https://arxiv.org/pdf/2103.12731.pdf)
#### 20.2. Overview
![](.//AttentionMechanism/model/img/HaloNet.png)
#### 20.3. Usage Code
```python
from AttentionMechanism.model.attention.HaloAttention import HaloAttention
import torch
from torch import nn
from torch.nn import functional as F
input=torch.randn(1,512,8,8)
halo = HaloAttention(dim=512,
block_size=2,
halo_size=1,)
output=halo(input)
print(output.shape)
```
***
### 21. Polarized Self-Attention Usage
#### 21.1. Paper
[Polarized Self-Attention: Towards High-quality Pixel-wise Regression"](https://arxiv.org/abs/2107.00782)
#### 21.2. Overview
![](.//AttentionMechanism/model/img/PoSA.png)
#### 21.3. Usage Code
```python
from AttentionMechanism.model.attention.PolarizedSelfAttention import ParallelPolarizedSelfAttention,SequentialPolarizedSelfAttention
import torch
from torch import nn
from torch.nn import functional as F
input=torch.randn(1,512,7,7)
psa = SequentialPolarizedSelfAttention(channel=512)
output=psa(input)
print(output.shape)
```
***
### 22. CoTAttention Usage
#### 22.1. Paper
[Contextual Transformer Networks for Visual Recognition---arXiv 2021.07.26](https://arxiv.org/abs/2107.12292)
#### 22.2. Overview
![](.//AttentionMechanism/model/img/CoT.png)
#### 22.3. Usage Code
```python
from AttentionMechanism.model.attention.CoTAttention import CoTAttention
import torch
from torch import nn
from torch.nn import functional as F
input=torch.randn(50,512,7,7)
cot = CoTAttention(dim=512,kernel_size=3)
output=cot(input)
print(output.shape)
```
***
### 23. Residual Attention Usage
#### 23.1. Paper
[Residual Attention: A Simple but Effective Method for Multi-Label Recognition---ICCV2021](https://arxiv.org/abs/2108.02456)
#### 23.2. Overview
![](.//AttentionMechanism/model/img/ResAtt.png)
#### 23.3. Usage Code
```python
from AttentionMechanism.model.attention.ResidualAttention import ResidualAttention
import torch
from torch import nn
from torch.nn import functional as F
input=torch.randn(50,512,7,7)
resatt = ResidualAttention(channel=512,num_class=1000,la=0.2)
output=resatt(input)
print(output.shape)
```
***
### 24. S2 Attention Usage
#### 24.1. Paper
[S²-MLPv2: Improved Spatial-Shift MLP Architecture for Vision---arXiv 2021.08.02](https://arxiv.org/abs/2108.01072)
#### 24.2. Overview
![](.//AttentionMechanism/model/img/S2Attention.png)
#### 24.3. Usage Code
```python
from AttentionMechanism.model.attention.S2Attention import S2Attention
import torch
from torch import nn
from torch.nn import functional as F
input=torch.randn(50,512,7,7)
s2att = S2Attention(channels=512)
output=s2att(input)
print(output.shape)
```
***
### 25. GFNet Attention Usage
#### 25.1. Paper
[Global Filter Networks for Image Classification---arXiv 2021.07.01](https://arxiv.org/abs/2107.00645)
#### 25.2. Overview
![](.//AttentionMechanism/model/img/GFNet.jpg)
#### 25.3. Usage Code - Implemented by [Wenliang Zhao (Author)](https://scholar.google.com/citations?user=lyPWvuEAAAAJ&hl=en)
```python
from AttentionMechanism.model.attention.gfnet import GFNet
import torch
from torch import nn
from torch.nn import functional as F
x = torch.randn(1, 3, 224, 224)
gfnet = GFNet(embed_dim=384, img_size=224, patch_size=16, num_classes=1000)
out = gfnet(x)
print(out.shape)
```
***
### 26. TripletAttention Usage
#### 26.1. Paper
[Rotate to Attend: Convolutional Triplet Attention Module---CVPR 2021](https://arxiv.org/abs/2010.03045)
#### 26.2. Overview
![](.//AttentionMechanism/model/img/triplet.png)
#### 26.3. Usage Code - Implemented by [digantamisra98](https://github.com/digantamisra98)
```python
from AttentionMechanism.model.attention.TripletAttention import TripletAttention
import torch
from torch import nn
from torch.nn import functional as F
input=torch.randn(50,512,7,7)
triplet = TripletAttention()
output=triplet(input)
print(output.shape)
```
***
### 27. Coordinate Attention Usage
#### 27.1. Paper
[Coordinate Attention for Efficient Mobile Network Design---CVPR 2021](https://arxiv.org/abs/2103.02907)
#### 27.2. Overview
![](.//AttentionMechanism/model/img/CoordAttention.png)
#### 27.3. Usage Code - Implemented by [Andrew-Qibin](https://github.com/Andrew-Qibin)
```python
from AttentionMechanism.model.attention.CoordAttention import CoordAtt
import torch
from torch import nn
from torch.nn import functional as F
inp=torch.rand([2, 96, 56, 56])
inp_dim, oup_dim = 96, 96
reduction=32
coord_attention = CoordAtt(inp_dim, oup_dim, reduction=reduction)
output=coord_attention(inp)
print(output.shape)
```
***
### 28. MobileViT Attention Usage
#### 28.1. Paper
[MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer---ArXiv 2021.10.05](https://arxiv.org/abs/2103.02907)
#### 28.2. Overview
![](.//AttentionMechanism/model/img/MobileViTAttention.png)
#### 28.3. Usage Code
```python
from AttentionMechanism.model.attention.MobileViTAttention import MobileViTAttention
import torch
from torch import nn
from torch.nn import functional as F
if __name__ == '__main__':
m=MobileViTAttention()
input=torch.randn(1,3,49,49)
output=m(input)
print(output.shape) #output:(1,3,49,49)
```
***
### 29. ParNet Attention Usage
#### 29.1. Paper
[Non-deep Networks---ArXiv 2021.10.20](https://arxiv.org/abs/2110.07641)
#### 29.2. Overview
![](.//AttentionMechanism/model/img/ParNet.png)
#### 29.3. Usage Code
```python
from AttentionMechanism.model.attention.ParNetAttention import *
import torch
from torch import nn
from torch.nn import functional as F
if __name__ == '__main__':
input=torch.randn(50,512,7,7)
pna = ParNetAttention(channel=512)
output=pna(input)
print(output.shape) #50,512,7,7
```
***
### 30. UFO Attention Usage
#### 30.1. Paper
[UFO-ViT: High Performance Linear Vision Transformer without Softmax---ArXiv 2021.09.29](https://arxiv.org/abs/2110.07641)
#### 30.2. Overview
![](.//AttentionMechanism/model/img/UFO.png)
#### 30.3. Usage Code
```python
from AttentionMechanism.model.attention.UFOAttention import *
import torch
from torch import nn
from torch.nn import functional as F
if __name__ == '__main__':
input=torch.randn(50,49,512)
ufo = UFOAttention(d_model=512, d_k=512, d_v=512, h=8)
output=ufo(input,input,input)
print(output.shape) #[50, 49, 512]
```
***
### 31. ACmix Attention Usage
#### 31.1. Paper
[On the Integration of Self-Attention and Convolution](https://arxiv.org/pdf/2111.14556.pdf)
#### 31.2. Usage Code
```python
from AttentionMechanism.model.attention.ACmix import ACmix
import torch
if __name__ == '__main__':
input=torch.randn(50,256,7,7)
acmix = ACmix(in_planes=256, out_planes=256)
output=acmix(input)
print(output.shape)
```
***
### 32. MobileViTv2 Attention Usage
#### 32.1. Paper
[Separable Self-attention for Mobile Vision Transformers---ArXiv 2022.06.06](https://arxiv.org/abs/2206.02680)
#### 32.2. Overview
![](.//AttentionMechanism/model/img/MobileViTv2.png)
#### 32.3. Usage Code
```python
from AttentionMechanism.model.attention.MobileViTv2Attention import MobileViTv2Attention
import torch
from torch import nn
from torch.nn import functional as F
if __name__ == '__main__':
input=torch.randn(50,49,512)
sa = MobileViTv2Attention(d_model=512)
output=sa(input)
print(output.shape)
```
***
### 33. DAT Attention Usage
#### 33.1. Paper
[Vision Transformer with Deformable Attention---CVPR2022](https://arxiv.org/abs/2201.00520)
#### 33.2. Usage Code
```python
from AttentionMechanism.model.attention.DAT import DAT
import torch
if __name__ == '__main__':
input=torch.randn(1,3,224,224)
model = DAT(
img_size=224,
patch_size=4,
num_classes=1000,
expansion=4,
dim_stem=96,
dims=[96, 192, 384, 768],
depths=[2, 2, 6, 2],
stage_spec=[['L', 'S'], ['L', 'S'], ['L', 'D', 'L', 'D', 'L', 'D'], ['L', 'D']],
heads=[3, 6, 12, 24],
window_sizes=[7, 7, 7, 7] ,
groups=[-1, -1, 3, 6],
use_pes=[False, False, True, True],
dwc_pes=[False, False, False, False],
strides=[-1, -1, 1, 1],
sr_ratios=[-1, -1, -1, -1],
offset_range_factor=[-1, -1, 2, 2],
no_offs=[False, False, False, False],
fixed_pes=[False, False, False, False],
use_dwc_mlps=[False, False, False, False],
use_conv_patches=False,
drop_rate=0.0,
attn_drop_rate=0.0,
drop_path_rate=0.2,
)
output=model(input)
print(output[0].shape)
```
***
### 34. CrossFormer Attention Usage
#### 34.1. Paper
[CROSSFORMER: A VERSATILE VISION TRANSFORMER HINGING ON CROSS-SCALE ATTENTION---ICLR 2022](https://arxiv.org/pdf/2108.00154.pdf)
#### 34.2. Usage Code
```python
from AttentionMechanism.model.attention.Crossformer import CrossFormer
import torch
if __name__ == '__main__':
input=torch.randn(1,3,224,224)
model = CrossFormer(img_size=224,
patch_size=[4, 8, 16, 32],
in_chans= 3,
num_classes=1000,
embed_dim=48,
depths=[2, 2, 6, 2],
num_heads=[3, 6, 12, 24],
group_size=[7, 7, 7, 7],
mlp_ratio=4.,
qkv_bias=True,
qk_scale=None,
drop_rate=0.0,
drop_path_rate=0.1,
ape=False,
patch_norm=True,
use_checkpoint=False,
merge_size=[[2, 4], [2,4], [2, 4]]
)
output=model(input)
print(output.shape)
```
***
### 35. MOATransformer Attention Usage
#### 35.1. Paper
[Aggregating Global Features into Local Vision Transformer](https://arxiv.org/abs/2201.12903)
#### 35.2. Usage Code
```python
from AttentionMechanism.model.attention.MOATransformer import MOATransformer
import torch
if __name__ == '__main__':
input=torch.randn(1,3,224,224)
model = MOATransformer(
img_size=224,
patch_size=4,
in_chans=3,
num_classes=1000,
embed_dim=96,
depths=[2, 2, 6],
num_heads=[3, 6, 12],
window_size=14,
mlp_ratio=4.,
qkv_bias=True,
qk_scale=None,
drop_rate=0.0,
drop_path_rate=0.1,
ape=False,
patch_norm=True,
use_checkpoint=False
)
output=model(input)
print(output.shape)
```
***
### 36. CrissCrossAttention Attention Usage
#### 36.1. Paper
[CCNet: Criss-Cross Attention for Semantic Segmentation](https://arxiv.org/abs/1811.11721)
#### 36.2. Usage Code
```python
from AttentionMechanism.model.attention.CrissCrossAttention import CrissCrossAttention
import torch
if __name__ == '__main__':
input=torch.randn(3, 64, 7, 7)
model = CrissCrossAttention(64)
outputs = model(input)
print(outputs.shape)
```
***
### 37. Axial_attention Attention Usage
#### 37.1. Paper
[Axial Attention in Multidimensional Transformers](https://arxiv.org/abs/1912.12180)
#### 37.2. Usage Code
```python
from AttentionMechanism.model.attention.Axial_attention import AxialImageTransformer
import torch
if __name__ == '__main__':
input=torch.randn(3, 128, 7, 7)
model = AxialImageTransformer(
dim = 128,
depth = 12,
reversible = True
)
outputs = model(input)
print(outputs.shape)
```
***
### 38. Frequency Channel Attention Usage
#### 38.1. Paper
[FcaNet: Frequency Channel Attention Networks (ICCV 2021)](https://arxiv.org/abs/2012.11879)
#### 38.2. Overview
![](.//AttentionMechanism/model/img/FCANet.png)
#### 38.3. Usage Code
```python
from AttentionMechanism.model.attention.FCA import MultiSpectralAttentionLayer
import torch
if __name__ == "__main__":
input = torch.randn(32, 128, 64, 64) # (b, c, h, w)
fca_layer = MultiSpectralAttentionLayer(channel = 128, dct_h = 64, dct_w = 64, reduction = 16, freq_sel_method = 'top16')
output = fca_layer(input)
print(output.shape)
```
***
### 39. Attention Augmented Convolutional Networks Usage
#### 39.1. Paper
[Attention Augmented Convolutional Networks (ICCV 2019)](https://arxiv.org/abs/1904.09925)
#### 39.2. Overview
![](.//AttentionMechanism/model/img/AAAttention.png)
#### 39.3. Usage Code
```python
from AttentionMechanism.model.attention.AAAttention import AugmentedConv
import torch
if __name__ == "__main__":
input = torch.randn((16, 3, 32, 32))
augmented_conv = AugmentedConv(in_channels=3, out_channels=64, kernel_size=3, dk=40, dv=4, Nh=4, relative=True, stride=2, shape=16)
output = augmented_conv(input)
print(output.shape)
```
***
### 40. Global Context Attention Usage
#### 40.1. Paper
[GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond (ICCVW 2019 Best Paper)](https://arxiv.org/abs/1904.11492)
[Global Context Networks (TPAMI 2020)](https://arxiv.org/abs/2012.13375)
#### 40.2. Overview
![](.//AttentionMechanism/model/img/GCNet.png)
#### 40.3. Usage Code
```python
from AttentionMechanism.model.attention.GCAttention import GCModule
import torch
if __name__ == "__main__":
input = torch.randn(16, 64, 32, 32)
gc_layer = GCModule(64)
output = gc_layer(input)
print(output.shape)
```
***
### 41. Linear Context Transform Attention Usage
#### 41.1. Paper
[Linear Context Transform Block (AAAI 2020)](https://arxiv.org/pdf/1909.03834v2)
#### 41.2. Overview
![](.//AttentionMechanism/model/img/LCTAttention.png)
#### 41.3. Usage Code
```python
from AttentionMechanism.model.attention.LCTAttention import LCT
import torch
if __name__ == "__main__":
x = torch.randn(16, 64, 32, 32)
attn = LCT(64, 8)
y = attn(x)
print(y.shape)
```
***
### 42. Gated Channel Transformation Usage
#### 42.1. Paper
[Gated Channel Transformation for Visual Recognition (CVPR 2020)](https://openaccess.thecvf.com/content_CVPR_2020/papers/Yang_Gated_Channel_Transformation_for_Visual_Recognition_CVPR_2020_paper.pdf)
#### 42.2. Overview
![](.//AttentionMechanism/model/img/GCT.png)
#### 42.3. Usage Code
```python
from AttentionMechanism.model.attention.GCTAttention import GCT
import torch
if __name__ == "__main__":
input = torch.randn(16, 64, 32, 32)
gct_layer = GCT(64)
output = gct_layer(input)
print(output.shape)
```
***
### 43. Gaussian Context Attention Usage
#### 43.1. Paper
[Gaussian Context Transformer (CVPR 2021)](https://openaccess.thecvf.com//content/CVPR2021/papers/Ruan_Gaussian_Context_Transformer_CVPR_2021_paper.pdf)
#### 43.2. Overview
![](.//AttentionMechanism/model/img/GaussianCA.png)
#### 43.3. Usage Code
```python
from AttentionMechanism.model.attention.GaussianAttention import GCA
import torch
if __name__ == "__main__":
input = torch.randn(16, 64, 32, 32)
gca_layer = GCA(64)
output = gca_layer(input)
print(output.shape)
```
***
# Acknowledgements
During the development of this project, the following open-source projects provided significant help and support. We hereby express our sincere gratitude:
- [**https://github.com/xmu-xiaoma666/External-Attention-pytorch**](https://github.com/xmu-xiaoma666/External-Attention-pytorch)
- [**https://github.com/cmhungsteve/Awesome-Transformer-Attention**](https://github.com/cmhungsteve/Awesome-Transformer-Attention)
Raw data
{
"_id": null,
"home_page": "https://github.com/gongyan1/Attention-Mechanism-Pytorch",
"name": "AttentionMechanism",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7.0",
"maintainer_email": null,
"keywords": "Attention, Machine Learning, Deep Learning, Neural Networks, Pytorch",
"author": null,
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/10/67/a6cdd15c47abec71f2cf2f818adad2c82ef68a3f60307e6fc46c003f1d1c/attentionmechanism-1.0.2.tar.gz",
"platform": null,
"description": "# Attention-Mechanism-Pytorch\nThis repository contains an implementation of many attention mechanism models.\n\n# Change Log\n- [x] Published Initial Attention Models, 2024-8-12.\n\n\n# \u76ee\u5f55\n\n- [Attention Series](#attention-series)\n - [1. External Attention Usage](#1-external-attention-usage)\n\n - [2. Self Attention Usage](#2-self-attention-usage)\n\n - [3. Simplified Self Attention Usage](#3-simplified-self-attention-usage)\n\n - [4. Squeeze-and-Excitation Attention Usage](#4-squeeze-and-excitation-attention-usage)\n\n - [5. SK Attention Usage](#5-sk-attention-usage)\n\n - [6. CBAM Attention Usage](#6-cbam-attention-usage)\n\n - [7. BAM Attention Usage](#7-bam-attention-usage)\n \n - [8. ECA Attention Usage](#8-eca-attention-usage)\n\n - [9. DANet Attention Usage](#9-danet-attention-usage)\n\n - [10. Pyramid Split Attention (PSA) Usage](#10-Pyramid-Split-Attention-Usage)\n\n - [11. Efficient Multi-Head Self-Attention(EMSA) Usage](#11-Efficient-Multi-Head-Self-Attention-Usage)\n\n - [12. Shuffle Attention Usage](#12-Shuffle-Attention-Usage)\n \n - [13. MUSE Attention Usage](#13-MUSE-Attention-Usage)\n \n - [14. SGE Attention Usage](#14-SGE-Attention-Usage)\n\n - [15. A2 Attention Usage](#15-A2-Attention-Usage)\n\n - [16. AFT Attention Usage](#16-AFT-Attention-Usage)\n\n - [17. Outlook Attention Usage](#17-Outlook-Attention-Usage)\n\n - [18. ViP Attention Usage](#18-ViP-Attention-Usage)\n\n - [19. CoAtNet Attention Usage](#19-CoAtNet-Attention-Usage)\n\n - [20. HaloNet Attention Usage](#20-HaloNet-Attention-Usage)\n\n - [21. Polarized Self-Attention Usage](#21-Polarized-Self-Attention-Usage)\n\n - [22. CoTAttention Usage](#22-CoTAttention-Usage)\n\n - [23. Residual Attention Usage](#23-Residual-Attention-Usage)\n \n - [24. S2 Attention Usage](#24-S2-Attention-Usage)\n\n - [25. GFNet Attention Usage](#25-GFNet-Attention-Usage)\n\n - [26. Triplet Attention Usage](#26-TripletAttention-Usage)\n\n - [27. Coordinate Attention Usage](#27-Coordinate-Attention-Usage)\n\n - [28. MobileViT Attention Usage](#28-MobileViT-Attention-Usage)\n\n - [29. ParNet Attention Usage](#29-ParNet-Attention-Usage)\n\n - [30. UFO Attention Usage](#30-UFO-Attention-Usage)\n\n - [31. ACmix Attention Usage](#31-Acmix-Attention-Usage)\n \n - [32. MobileViTv2 Attention Usage](#32-MobileViTv2-Attention-Usage)\n\n - [33. DAT Attention Usage](#33-DAT-Attention-Usage)\n\n - [34. CrossFormer Attention Usage](#34-CrossFormer-Attention-Usage)\n\n - [35. MOATransformer Attention Usage](#35-MOATransformer-Attention-Usage)\n\n - [36. CrissCrossAttention Attention Usage](#36-CrissCrossAttention-Attention-Usage)\n\n - [37. Axial_attention Attention Usage](#37-Axial_attention-Attention-Usage)\n\n - [38. Frequency Channel Attention Usage](#38-Frequency-Channel-Attention-Usage)\n\n - [39. Attention Augmented Convolutional Networks Usage](#39-Attention-Augmented-Convolutional-Networks-Usage)\n\n - [40. Global Context Attention Usage](#40-Global-Context-Attention-Usage)\n\n - [41. Linear Context Transform Attention Usage](#41-Linear-Context-Transform-Attention-Usage)\n\n - [42. Gated Channel Transformation Usage](#42-Gated-Channel-Transformation-Usage)\n\n - [43. Gaussian Context Attention Usage](#43-Gaussian-Context-Attention-Usage)\n\n\n- [MLP Series](#mlp-series)\n\n - [1. RepMLP Usage](#1-RepMLP-Usage)\n\n - [2. MLP-Mixer Usage](#2-MLP-Mixer-Usage)\n\n - [3. ResMLP Usage](#3-ResMLP-Usage)\n\n - [4. gMLP Usage](#4-gMLP-Usage)\n\n - [5. sMLP Usage](#5-sMLP-Usage)\n\n - [6. vip-mlp Usage](#6-vip-mlp-Usage)\n\n***\n\n\n### 1. External Attention Usage\n#### 1.1. Paper\n[\"Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks\"](https://arxiv.org/abs/2105.02358)\n\n#### 1.2. Overview\n![](.//AttentionMechanism/model/img/External_Attention.png)\n\n#### 1.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.ExternalAttention import ExternalAttention\nimport torch\n\ninput=torch.randn(50,49,512)\nea = ExternalAttention(d_model=512,S=8)\noutput=ea(input)\nprint(output.shape)\n```\n\n***\n\n\n### 2. Self Attention Usage\n#### 2.1. Paper\n[\"Attention Is All You Need\"](https://arxiv.org/pdf/1706.03762.pdf)\n\n#### 1.2. Overview\n![](.//AttentionMechanism/model/img/SA.png)\n\n#### 1.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.SelfAttention import ScaledDotProductAttention\nimport torch\n\ninput=torch.randn(50,49,512)\nsa = ScaledDotProductAttention(d_model=512, d_k=512, d_v=512, h=8)\noutput=sa(input,input,input)\nprint(output.shape)\n```\n\n***\n\n### 3. Simplified Self Attention Usage\n#### 3.1. Paper\n[SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks (ICML 2021)](https://proceedings.mlr.press/v139/yang21o/yang21o.pdf)\n\n#### 3.2. Overview\n![](.//AttentionMechanism/model/img/SimAttention.png)\n\n#### 3.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.SimplifiedSelfAttention import SimplifiedScaledDotProductAttention\nimport torch\n\ninput=torch.randn(50,49,512)\nssa = SimplifiedScaledDotProductAttention(d_model=512, h=8)\noutput=ssa(input,input,input)\nprint(output.shape)\n\n```\n\n***\n\n### 4. Squeeze-and-Excitation Attention Usage\n#### 4.1. Paper\n[\"Squeeze-and-Excitation Networks\"](https://arxiv.org/abs/1709.01507)\n\n#### 4.2. Overview\n![](.//AttentionMechanism/model/img/SE.png)\n\n#### 4.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.SEAttention import SEAttention\nimport torch\n\ninput=torch.randn(50,512,7,7)\nse = SEAttention(channel=512,reduction=8)\noutput=se(input)\nprint(output.shape)\n\n```\n\n***\n\n### 5. SK Attention Usage\n#### 5.1. Paper\n[\"Selective Kernel Networks\"](https://arxiv.org/pdf/1903.06586.pdf)\n\n#### 5.2. Overview\n![](.//AttentionMechanism/model/img/SK.png)\n\n#### 5.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.SKAttention import SKAttention\nimport torch\n\ninput=torch.randn(50,512,7,7)\nse = SKAttention(channel=512,reduction=8)\noutput=se(input)\nprint(output.shape)\n\n```\n***\n\n### 6. CBAM Attention Usage\n#### 6.1. Paper\n[\"CBAM: Convolutional Block Attention Module\"](https://openaccess.thecvf.com/content_ECCV_2018/papers/Sanghyun_Woo_Convolutional_Block_Attention_ECCV_2018_paper.pdf)\n\n#### 6.2. Overview\n![](.//AttentionMechanism/model/img/CBAM1.png)\n\n![](.//AttentionMechanism/model/img/CBAM2.png)\n\n#### 6.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.CBAM import CBAMBlock\nimport torch\n\ninput=torch.randn(50,512,7,7)\nkernel_size=input.shape[2]\ncbam = CBAMBlock(channel=512,reduction=16,kernel_size=kernel_size)\noutput=cbam(input)\nprint(output.shape)\n\n```\n\n***\n\n### 7. BAM Attention Usage\n#### 7.1. Paper\n[\"BAM: Bottleneck Attention Module\"](https://arxiv.org/pdf/1807.06514.pdf)\n\n#### 7.2. Overview\n![](.//AttentionMechanism/model/img/BAM.png)\n\n#### 7.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.BAM import BAMBlock\nimport torch\n\ninput=torch.randn(50,512,7,7)\nbam = BAMBlock(channel=512,reduction=16,dia_val=2)\noutput=bam(input)\nprint(output.shape)\n\n```\n\n***\n\n### 8. ECA Attention Usage\n#### 8.1. Paper\n[\"ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks\"](https://arxiv.org/pdf/1910.03151.pdf)\n\n#### 8.2. Overview\n![](.//AttentionMechanism/model/img/ECA.png)\n\n#### 8.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.ECAAttention import ECAAttention\nimport torch\n\ninput=torch.randn(50,512,7,7)\neca = ECAAttention(kernel_size=3)\noutput=eca(input)\nprint(output.shape)\n\n```\n\n***\n\n### 9. DANet Attention Usage\n#### 9.1. Paper\n[\"Dual Attention Network for Scene Segmentation\"](https://arxiv.org/pdf/1809.02983.pdf)\n\n#### 9.2. Overview\n![](.//AttentionMechanism/model/img/danet.png)\n\n#### 9.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.DANet import DAModule\nimport torch\n\ninput=torch.randn(50,512,7,7)\ndanet=DAModule(d_model=512,kernel_size=3,H=7,W=7)\nprint(danet(input).shape)\n\n```\n\n***\n\n### 10. Pyramid Split Attention Usage\n\n#### 10.1. Paper\n[\"EPSANet: An Efficient Pyramid Split Attention Block on Convolutional Neural Network\"](https://arxiv.org/pdf/2105.14447.pdf)\n\n#### 10.2. Overview\n![](.//AttentionMechanism/model/img/psa.png)\n\n#### 10.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.PSA import PSA\nimport torch\n\ninput=torch.randn(50,512,7,7)\npsa = PSA(channel=512,reduction=8)\noutput=psa(input)\nprint(output.shape)\n\n```\n\n***\n\n\n### 11. Efficient Multi-Head Self-Attention Usage\n\n#### 11.1. Paper\n[\"ResT: An Efficient Transformer for Visual Recognition\"](https://arxiv.org/abs/2105.13677)\n\n#### 11.2. Overview\n![](.//AttentionMechanism/model/img/EMSA.png)\n\n#### 11.3. Usage Code\n```python\n\nfrom AttentionMechanism.model.attention.EMSA import EMSA\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninput=torch.randn(50,64,512)\nemsa = EMSA(d_model=512, d_k=512, d_v=512, h=8,H=8,W=8,ratio=2,apply_transform=True)\noutput=emsa(input,input,input)\nprint(output.shape)\n \n```\n\n***\n\n\n### 12. Shuffle Attention Usage\n\n#### 12.1. Paper\n[\"SA-NET: SHUFFLE ATTENTION FOR DEEP CONVOLUTIONAL NEURAL NETWORKS\"](https://arxiv.org/pdf/2102.00240.pdf)\n\n#### 12.2. Overview\n![](.//AttentionMechanism/model/img/ShuffleAttention.jpg)\n\n#### 12.3. Usage Code\n```python\n\nfrom AttentionMechanism.model.attention.ShuffleAttention import ShuffleAttention\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\n\ninput=torch.randn(50,512,7,7)\nse = ShuffleAttention(channel=512,G=8)\noutput=se(input)\nprint(output.shape)\n \n```\n***\n\n\n### 13. MUSE Attention Usage\n\n#### 13.1. Paper\n[\"MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning\"](https://arxiv.org/abs/1911.09483)\n\n#### 13.2. Overview\n![](.//AttentionMechanism/model/img/MUSE.png)\n\n#### 13.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.MUSEAttention import MUSEAttention\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninput=torch.randn(50,49,512)\nsa = MUSEAttention(d_model=512, d_k=512, d_v=512, h=8)\noutput=sa(input,input,input)\nprint(output.shape)\n\n```\n\n***\n\n\n### 14. SGE Attention Usage\n\n#### 14.1. Paper\n[Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks](https://arxiv.org/pdf/1905.09646.pdf)\n\n#### 14.2. Overview\n![](.//AttentionMechanism/model/img/SGE.png)\n\n#### 14.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.SGE import SpatialGroupEnhance\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninput=torch.randn(50,512,7,7)\nsge = SpatialGroupEnhance(groups=8)\noutput=sge(input)\nprint(output.shape)\n\n```\n\n***\n\n\n### 15. A2 Attention Usage\n\n#### 15.1. Paper\n[A2-Nets: Double Attention Networks](https://arxiv.org/pdf/1810.11579.pdf)\n\n#### 15.2. Overview\n![](.//AttentionMechanism/model/img/A2.png)\n\n#### 15.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.A2Atttention import DoubleAttention\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninput=torch.randn(50,512,7,7)\na2 = DoubleAttention(512,128,128,True)\noutput=a2(input)\nprint(output.shape)\n\n```\n\n\n### 16. AFT Attention Usage\n\n#### 16.1. Paper\n[An Attention Free Transformer](https://arxiv.org/pdf/2105.14103v1.pdf)\n\n#### 16.2. Overview\n![](.//AttentionMechanism/model/img/AFT.jpg)\n\n#### 16.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.AFT import AFT_FULL\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninput=torch.randn(50,49,512)\naft_full = AFT_FULL(d_model=512, n=49)\noutput=aft_full(input)\nprint(output.shape)\n\n```\n***\n\n\n### 17. Outlook Attention Usage\n\n#### 17.1. Paper\n\n\n[VOLO: Vision Outlooker for Visual Recognition\"](https://arxiv.org/abs/2106.13112)\n\n\n#### 17.2. Overview\n![](.//AttentionMechanism/model/img/OutlookAttention.png)\n\n#### 17.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.OutlookAttention import OutlookAttention\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninput=torch.randn(50,28,28,512)\noutlook = OutlookAttention(dim=512)\noutput=outlook(input)\nprint(output.shape)\n\n```\n***\n\n\n### 18. ViP Attention Usage\n\n#### 18.1. Paper\n\n[Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition\"](https://arxiv.org/abs/2106.12368)\n\n#### 18.2. Overview\n![](.//AttentionMechanism/model/img/ViP.png)\n\n#### 18.3. Usage Code\n```python\n\nfrom AttentionMechanism.model.attention.ViP import WeightedPermuteMLP\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninput=torch.randn(64,8,8,512)\nseg_dim=8\nvip=WeightedPermuteMLP(512,seg_dim)\nout=vip(input)\nprint(out.shape)\n\n```\n***\n\n\n### 19. CoAtNet Attention Usage\n\n#### 19.1. Paper\n\n[CoAtNet: Marrying Convolution and Attention for All Data Sizes\"](https://arxiv.org/abs/2106.04803) \n\n#### 19.2. Overview\nNone\n\n#### 19.3. Usage Code\n```python\n\nfrom AttentionMechanism.model.attention.CoAtNet import CoAtNet\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninput=torch.randn(1,3,224,224)\nmbconv=CoAtNet(in_ch=3,image_size=224)\nout=mbconv(input)\nprint(out.shape)\n\n```\n\n\n***\n\n### 20. HaloNet Attention Usage\n\n#### 20.1. Paper\n\n[Scaling Local Self-Attention for Parameter Efficient Visual Backbones\"](https://arxiv.org/pdf/2103.12731.pdf) \n\n\n#### 20.2. Overview\n\n![](.//AttentionMechanism/model/img/HaloNet.png)\n\n#### 20.3. Usage Code\n```python\n\nfrom AttentionMechanism.model.attention.HaloAttention import HaloAttention\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninput=torch.randn(1,512,8,8)\nhalo = HaloAttention(dim=512,\n block_size=2,\n halo_size=1,)\noutput=halo(input)\nprint(output.shape)\n\n```\n***\n\n\n### 21. Polarized Self-Attention Usage\n\n#### 21.1. Paper\n\n[Polarized Self-Attention: Towards High-quality Pixel-wise Regression\"](https://arxiv.org/abs/2107.00782) \n\n#### 21.2. Overview\n\n![](.//AttentionMechanism/model/img/PoSA.png)\n\n#### 21.3. Usage Code\n```python\n\nfrom AttentionMechanism.model.attention.PolarizedSelfAttention import ParallelPolarizedSelfAttention,SequentialPolarizedSelfAttention\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninput=torch.randn(1,512,7,7)\npsa = SequentialPolarizedSelfAttention(channel=512)\noutput=psa(input)\nprint(output.shape)\n\n```\n***\n\n\n### 22. CoTAttention Usage\n\n#### 22.1. Paper\n\n[Contextual Transformer Networks for Visual Recognition---arXiv 2021.07.26](https://arxiv.org/abs/2107.12292) \n\n#### 22.2. Overview\n\n![](.//AttentionMechanism/model/img/CoT.png)\n\n#### 22.3. Usage Code\n```python\n\nfrom AttentionMechanism.model.attention.CoTAttention import CoTAttention\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninput=torch.randn(50,512,7,7)\ncot = CoTAttention(dim=512,kernel_size=3)\noutput=cot(input)\nprint(output.shape)\n\n```\n\n***\n\n\n### 23. Residual Attention Usage\n\n#### 23.1. Paper\n\n[Residual Attention: A Simple but Effective Method for Multi-Label Recognition---ICCV2021](https://arxiv.org/abs/2108.02456) \n\n\n#### 23.2. Overview\n\n![](.//AttentionMechanism/model/img/ResAtt.png)\n\n#### 23.3. Usage Code\n```python\n\nfrom AttentionMechanism.model.attention.ResidualAttention import ResidualAttention\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninput=torch.randn(50,512,7,7)\nresatt = ResidualAttention(channel=512,num_class=1000,la=0.2)\noutput=resatt(input)\nprint(output.shape)\n\n```\n\n***\n\n\n### 24. S2 Attention Usage\n\n#### 24.1. Paper\n\n[S\u00b2-MLPv2: Improved Spatial-Shift MLP Architecture for Vision---arXiv 2021.08.02](https://arxiv.org/abs/2108.01072) \n\n#### 24.2. Overview\n\n![](.//AttentionMechanism/model/img/S2Attention.png)\n\n#### 24.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.S2Attention import S2Attention\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninput=torch.randn(50,512,7,7)\ns2att = S2Attention(channels=512)\noutput=s2att(input)\nprint(output.shape)\n\n```\n\n***\n\n\n### 25. GFNet Attention Usage\n\n#### 25.1. Paper\n\n[Global Filter Networks for Image Classification---arXiv 2021.07.01](https://arxiv.org/abs/2107.00645) \n\n\n#### 25.2. Overview\n\n![](.//AttentionMechanism/model/img/GFNet.jpg)\n\n#### 25.3. Usage Code - Implemented by [Wenliang Zhao (Author)](https://scholar.google.com/citations?user=lyPWvuEAAAAJ&hl=en)\n\n```python\nfrom AttentionMechanism.model.attention.gfnet import GFNet\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\nx = torch.randn(1, 3, 224, 224)\ngfnet = GFNet(embed_dim=384, img_size=224, patch_size=16, num_classes=1000)\nout = gfnet(x)\nprint(out.shape)\n\n```\n\n***\n\n\n### 26. TripletAttention Usage\n\n#### 26.1. Paper\n\n[Rotate to Attend: Convolutional Triplet Attention Module---CVPR 2021](https://arxiv.org/abs/2010.03045) \n\n#### 26.2. Overview\n\n![](.//AttentionMechanism/model/img/triplet.png)\n\n#### 26.3. Usage Code - Implemented by [digantamisra98](https://github.com/digantamisra98)\n\n```python\nfrom AttentionMechanism.model.attention.TripletAttention import TripletAttention\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\ninput=torch.randn(50,512,7,7)\ntriplet = TripletAttention()\noutput=triplet(input)\nprint(output.shape)\n```\n***\n\n\n### 27. Coordinate Attention Usage\n\n#### 27.1. Paper\n\n[Coordinate Attention for Efficient Mobile Network Design---CVPR 2021](https://arxiv.org/abs/2103.02907)\n\n\n#### 27.2. Overview\n\n![](.//AttentionMechanism/model/img/CoordAttention.png)\n\n#### 27.3. Usage Code - Implemented by [Andrew-Qibin](https://github.com/Andrew-Qibin)\n\n```python\nfrom AttentionMechanism.model.attention.CoordAttention import CoordAtt\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninp=torch.rand([2, 96, 56, 56])\ninp_dim, oup_dim = 96, 96\nreduction=32\n\ncoord_attention = CoordAtt(inp_dim, oup_dim, reduction=reduction)\noutput=coord_attention(inp)\nprint(output.shape)\n```\n\n***\n\n\n### 28. MobileViT Attention Usage\n\n#### 28.1. Paper\n\n[MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer---ArXiv 2021.10.05](https://arxiv.org/abs/2103.02907)\n\n\n#### 28.2. Overview\n\n![](.//AttentionMechanism/model/img/MobileViTAttention.png)\n\n#### 28.3. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.MobileViTAttention import MobileViTAttention\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\nif __name__ == '__main__':\n m=MobileViTAttention()\n input=torch.randn(1,3,49,49)\n output=m(input)\n print(output.shape) #output:(1,3,49,49)\n \n```\n\n***\n\n\n### 29. ParNet Attention Usage\n\n#### 29.1. Paper\n\n[Non-deep Networks---ArXiv 2021.10.20](https://arxiv.org/abs/2110.07641)\n\n\n#### 29.2. Overview\n\n![](.//AttentionMechanism/model/img/ParNet.png)\n\n#### 29.3. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.ParNetAttention import *\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\nif __name__ == '__main__':\n input=torch.randn(50,512,7,7)\n pna = ParNetAttention(channel=512)\n output=pna(input)\n print(output.shape) #50,512,7,7\n \n```\n\n***\n\n\n### 30. UFO Attention Usage\n\n#### 30.1. Paper\n\n[UFO-ViT: High Performance Linear Vision Transformer without Softmax---ArXiv 2021.09.29](https://arxiv.org/abs/2110.07641)\n\n\n#### 30.2. Overview\n\n![](.//AttentionMechanism/model/img/UFO.png)\n\n#### 30.3. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.UFOAttention import *\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\nif __name__ == '__main__':\n input=torch.randn(50,49,512)\n ufo = UFOAttention(d_model=512, d_k=512, d_v=512, h=8)\n output=ufo(input,input,input)\n print(output.shape) #[50, 49, 512]\n \n```\n\n***\n\n### 31. ACmix Attention Usage\n\n#### 31.1. Paper\n\n[On the Integration of Self-Attention and Convolution](https://arxiv.org/pdf/2111.14556.pdf)\n\n#### 31.2. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.ACmix import ACmix\nimport torch\n\nif __name__ == '__main__':\n input=torch.randn(50,256,7,7)\n acmix = ACmix(in_planes=256, out_planes=256)\n output=acmix(input)\n print(output.shape)\n \n```\n***\n\n### 32. MobileViTv2 Attention Usage\n\n#### 32.1. Paper\n\n[Separable Self-attention for Mobile Vision Transformers---ArXiv 2022.06.06](https://arxiv.org/abs/2206.02680)\n\n\n#### 32.2. Overview\n\n![](.//AttentionMechanism/model/img/MobileViTv2.png)\n\n#### 32.3. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.MobileViTv2Attention import MobileViTv2Attention\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\nif __name__ == '__main__':\n input=torch.randn(50,49,512)\n sa = MobileViTv2Attention(d_model=512)\n output=sa(input)\n print(output.shape)\n \n```\n***\n\n### 33. DAT Attention Usage\n\n#### 33.1. Paper\n\n[Vision Transformer with Deformable Attention---CVPR2022](https://arxiv.org/abs/2201.00520)\n\n#### 33.2. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.DAT import DAT\nimport torch\n\nif __name__ == '__main__':\n input=torch.randn(1,3,224,224)\n model = DAT(\n img_size=224,\n patch_size=4,\n num_classes=1000,\n expansion=4,\n dim_stem=96,\n dims=[96, 192, 384, 768],\n depths=[2, 2, 6, 2],\n stage_spec=[['L', 'S'], ['L', 'S'], ['L', 'D', 'L', 'D', 'L', 'D'], ['L', 'D']],\n heads=[3, 6, 12, 24],\n window_sizes=[7, 7, 7, 7] ,\n groups=[-1, -1, 3, 6],\n use_pes=[False, False, True, True],\n dwc_pes=[False, False, False, False],\n strides=[-1, -1, 1, 1],\n sr_ratios=[-1, -1, -1, -1],\n offset_range_factor=[-1, -1, 2, 2],\n no_offs=[False, False, False, False],\n fixed_pes=[False, False, False, False],\n use_dwc_mlps=[False, False, False, False],\n use_conv_patches=False,\n drop_rate=0.0,\n attn_drop_rate=0.0,\n drop_path_rate=0.2,\n )\n output=model(input)\n print(output[0].shape)\n \n```\n***\n\n### 34. CrossFormer Attention Usage\n\n#### 34.1. Paper\n\n[CROSSFORMER: A VERSATILE VISION TRANSFORMER HINGING ON CROSS-SCALE ATTENTION---ICLR 2022](https://arxiv.org/pdf/2108.00154.pdf)\n\n#### 34.2. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.Crossformer import CrossFormer\nimport torch\n\nif __name__ == '__main__':\n input=torch.randn(1,3,224,224)\n model = CrossFormer(img_size=224,\n patch_size=[4, 8, 16, 32],\n in_chans= 3,\n num_classes=1000,\n embed_dim=48,\n depths=[2, 2, 6, 2],\n num_heads=[3, 6, 12, 24],\n group_size=[7, 7, 7, 7],\n mlp_ratio=4.,\n qkv_bias=True,\n qk_scale=None,\n drop_rate=0.0,\n drop_path_rate=0.1,\n ape=False,\n patch_norm=True,\n use_checkpoint=False,\n merge_size=[[2, 4], [2,4], [2, 4]]\n )\n output=model(input)\n print(output.shape)\n \n```\n***\n\n### 35. MOATransformer Attention Usage\n\n#### 35.1. Paper\n\n[Aggregating Global Features into Local Vision Transformer](https://arxiv.org/abs/2201.12903)\n\n#### 35.2. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.MOATransformer import MOATransformer\nimport torch\n\nif __name__ == '__main__':\n input=torch.randn(1,3,224,224)\n model = MOATransformer(\n img_size=224,\n patch_size=4,\n in_chans=3,\n num_classes=1000,\n embed_dim=96,\n depths=[2, 2, 6],\n num_heads=[3, 6, 12],\n window_size=14,\n mlp_ratio=4.,\n qkv_bias=True,\n qk_scale=None,\n drop_rate=0.0,\n drop_path_rate=0.1,\n ape=False,\n patch_norm=True,\n use_checkpoint=False\n )\n output=model(input)\n print(output.shape)\n \n```\n***\n\n### 36. CrissCrossAttention Attention Usage\n\n#### 36.1. Paper\n\n[CCNet: Criss-Cross Attention for Semantic Segmentation](https://arxiv.org/abs/1811.11721)\n\n#### 36.2. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.CrissCrossAttention import CrissCrossAttention\nimport torch\n\nif __name__ == '__main__':\n input=torch.randn(3, 64, 7, 7)\n model = CrissCrossAttention(64)\n outputs = model(input)\n print(outputs.shape)\n \n```\n***\n\n### 37. Axial_attention Attention Usage\n\n#### 37.1. Paper\n\n[Axial Attention in Multidimensional Transformers](https://arxiv.org/abs/1912.12180)\n\n#### 37.2. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.Axial_attention import AxialImageTransformer\nimport torch\n\nif __name__ == '__main__':\n input=torch.randn(3, 128, 7, 7)\n model = AxialImageTransformer(\n dim = 128,\n depth = 12,\n reversible = True\n )\n outputs = model(input)\n print(outputs.shape)\n \n```\n***\n\n### 38. Frequency Channel Attention Usage\n\n#### 38.1. Paper\n\n[FcaNet: Frequency Channel Attention Networks (ICCV 2021)](https://arxiv.org/abs/2012.11879)\n\n#### 38.2. Overview\n\n![](.//AttentionMechanism/model/img/FCANet.png)\n\n#### 38.3. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.FCA import MultiSpectralAttentionLayer\nimport torch\n\nif __name__ == \"__main__\":\n input = torch.randn(32, 128, 64, 64) # (b, c, h, w)\n fca_layer = MultiSpectralAttentionLayer(channel = 128, dct_h = 64, dct_w = 64, reduction = 16, freq_sel_method = 'top16')\n output = fca_layer(input)\n print(output.shape)\n \n```\n***\n\n### 39. Attention Augmented Convolutional Networks Usage\n\n#### 39.1. Paper\n\n[Attention Augmented Convolutional Networks (ICCV 2019)](https://arxiv.org/abs/1904.09925)\n\n#### 39.2. Overview\n\n![](.//AttentionMechanism/model/img/AAAttention.png)\n\n#### 39.3. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.AAAttention import AugmentedConv\nimport torch\n\nif __name__ == \"__main__\":\n input = torch.randn((16, 3, 32, 32))\n augmented_conv = AugmentedConv(in_channels=3, out_channels=64, kernel_size=3, dk=40, dv=4, Nh=4, relative=True, stride=2, shape=16)\n output = augmented_conv(input)\n print(output.shape)\n \n```\n***\n\n### 40. Global Context Attention Usage\n\n#### 40.1. Paper\n\n[GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond (ICCVW 2019 Best Paper)](https://arxiv.org/abs/1904.11492)\n\n[Global Context Networks (TPAMI 2020)](https://arxiv.org/abs/2012.13375)\n\n#### 40.2. Overview\n\n![](.//AttentionMechanism/model/img/GCNet.png)\n\n#### 40.3. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.GCAttention import GCModule\nimport torch\n\nif __name__ == \"__main__\":\n input = torch.randn(16, 64, 32, 32)\n gc_layer = GCModule(64)\n output = gc_layer(input)\n print(output.shape)\n \n```\n***\n\n### 41. Linear Context Transform Attention Usage\n\n#### 41.1. Paper\n\n[Linear Context Transform Block (AAAI 2020)](https://arxiv.org/pdf/1909.03834v2)\n\n#### 41.2. Overview\n\n![](.//AttentionMechanism/model/img/LCTAttention.png)\n\n#### 41.3. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.LCTAttention import LCT\nimport torch\n\nif __name__ == \"__main__\":\n x = torch.randn(16, 64, 32, 32)\n attn = LCT(64, 8)\n y = attn(x)\n print(y.shape)\n \n```\n***\n\n### 42. Gated Channel Transformation Usage\n\n#### 42.1. Paper\n\n[Gated Channel Transformation for Visual Recognition (CVPR 2020)](https://openaccess.thecvf.com/content_CVPR_2020/papers/Yang_Gated_Channel_Transformation_for_Visual_Recognition_CVPR_2020_paper.pdf)\n\n#### 42.2. Overview\n\n![](.//AttentionMechanism/model/img/GCT.png)\n\n#### 42.3. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.GCTAttention import GCT\nimport torch\n\nif __name__ == \"__main__\":\n input = torch.randn(16, 64, 32, 32)\n gct_layer = GCT(64)\n output = gct_layer(input)\n print(output.shape)\n \n```\n***\n\n### 43. Gaussian Context Attention Usage\n\n#### 43.1. Paper\n\n[Gaussian Context Transformer (CVPR 2021)](https://openaccess.thecvf.com//content/CVPR2021/papers/Ruan_Gaussian_Context_Transformer_CVPR_2021_paper.pdf)\n\n#### 43.2. Overview\n\n![](.//AttentionMechanism/model/img/GaussianCA.png)\n\n#### 43.3. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.GaussianAttention import GCA\nimport torch\n\nif __name__ == \"__main__\":\n input = torch.randn(16, 64, 32, 32)\n gca_layer = GCA(64)\n output = gca_layer(input)\n print(output.shape)\n \n```\n\n***\n\n# Acknowledgements\nDuring the development of this project, the following open-source projects provided significant help and support. We hereby express our sincere gratitude:\n\n- [**https://github.com/xmu-xiaoma666/External-Attention-pytorch**](https://github.com/xmu-xiaoma666/External-Attention-pytorch)\n\n- [**https://github.com/cmhungsteve/Awesome-Transformer-Attention**](https://github.com/cmhungsteve/Awesome-Transformer-Attention)\n\n\n",
"bugtrack_url": null,
"license": "Apache",
"summary": "This repository contains an implementation of many attention mechanism models.",
"version": "1.0.2",
"project_urls": {
"Homepage": "https://github.com/gongyan1/Attention-Mechanism-Pytorch"
},
"split_keywords": [
"attention",
" machine learning",
" deep learning",
" neural networks",
" pytorch"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f0d9b5475da0341ef11afd3c7702beb9f2d5c1e11fb612a8ba628535acea3a18",
"md5": "285ae33bda9934b947392cf46d5fe393",
"sha256": "a1245d58e3db24f11f234ea62e2a4036d4b952c827f161f35814e1ac9d36aafe"
},
"downloads": -1,
"filename": "AttentionMechanism-1.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "285ae33bda9934b947392cf46d5fe393",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7.0",
"size": 79664,
"upload_time": "2024-08-12T10:27:31",
"upload_time_iso_8601": "2024-08-12T10:27:31.450176Z",
"url": "https://files.pythonhosted.org/packages/f0/d9/b5475da0341ef11afd3c7702beb9f2d5c1e11fb612a8ba628535acea3a18/AttentionMechanism-1.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "1067a6cdd15c47abec71f2cf2f818adad2c82ef68a3f60307e6fc46c003f1d1c",
"md5": "78b602a40eae5e6452b1976f5b7e4433",
"sha256": "93bce4749a6f3dab73b9cd49100a9a9f39c592ddcb3085807fdf4991b19735fb"
},
"downloads": -1,
"filename": "attentionmechanism-1.0.2.tar.gz",
"has_sig": false,
"md5_digest": "78b602a40eae5e6452b1976f5b7e4433",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7.0",
"size": 61149,
"upload_time": "2024-08-12T10:27:32",
"upload_time_iso_8601": "2024-08-12T10:27:32.708087Z",
"url": "https://files.pythonhosted.org/packages/10/67/a6cdd15c47abec71f2cf2f818adad2c82ef68a3f60307e6fc46c003f1d1c/attentionmechanism-1.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-12 10:27:32",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "gongyan1",
"github_project": "Attention-Mechanism-Pytorch",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "attentionmechanism"
}