AttentionMechanism

Name	AttentionMechanism JSON
Version	1.0.2 JSON
	download
home_page	https://github.com/gongyan1/Attention-Mechanism-Pytorch
Summary	This repository contains an implementation of many attention mechanism models.
upload_time	2024-08-12 10:27:32
maintainer	None
docs_url	None
author	None
requires_python	>=3.7.0
license	Apache
keywords	attention machine learning deep learning neural networks pytorch
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Attention-Mechanism-Pytorch
This repository contains an implementation of many attention mechanism models.

# Change Log
- [x] Published Initial Attention Models, 2024-8-12.


# 目录

- [Attention Series](#attention-series)
    - [1. External Attention Usage](#1-external-attention-usage)

    - [2. Self Attention Usage](#2-self-attention-usage)

    - [3. Simplified Self Attention Usage](#3-simplified-self-attention-usage)

    - [4. Squeeze-and-Excitation Attention Usage](#4-squeeze-and-excitation-attention-usage)

    - [5. SK Attention Usage](#5-sk-attention-usage)

    - [6. CBAM Attention Usage](#6-cbam-attention-usage)

    - [7. BAM Attention Usage](#7-bam-attention-usage)
    
    - [8. ECA Attention Usage](#8-eca-attention-usage)

    - [9. DANet Attention Usage](#9-danet-attention-usage)

    - [10. Pyramid Split Attention (PSA) Usage](#10-Pyramid-Split-Attention-Usage)

    - [11. Efficient Multi-Head Self-Attention(EMSA) Usage](#11-Efficient-Multi-Head-Self-Attention-Usage)

    - [12. Shuffle Attention Usage](#12-Shuffle-Attention-Usage)
    
    - [13. MUSE Attention Usage](#13-MUSE-Attention-Usage)
  
    - [14. SGE Attention Usage](#14-SGE-Attention-Usage)

    - [15. A2 Attention Usage](#15-A2-Attention-Usage)

    - [16. AFT Attention Usage](#16-AFT-Attention-Usage)

    - [17. Outlook Attention Usage](#17-Outlook-Attention-Usage)

    - [18. ViP Attention Usage](#18-ViP-Attention-Usage)

    - [19. CoAtNet Attention Usage](#19-CoAtNet-Attention-Usage)

    - [20. HaloNet Attention Usage](#20-HaloNet-Attention-Usage)

    - [21. Polarized Self-Attention Usage](#21-Polarized-Self-Attention-Usage)

    - [22. CoTAttention Usage](#22-CoTAttention-Usage)

    - [23. Residual Attention Usage](#23-Residual-Attention-Usage)
  
    - [24. S2 Attention Usage](#24-S2-Attention-Usage)

    - [25. GFNet Attention Usage](#25-GFNet-Attention-Usage)

    - [26. Triplet Attention Usage](#26-TripletAttention-Usage)

    - [27. Coordinate Attention Usage](#27-Coordinate-Attention-Usage)

    - [28. MobileViT Attention Usage](#28-MobileViT-Attention-Usage)

    - [29. ParNet Attention Usage](#29-ParNet-Attention-Usage)

    - [30. UFO Attention Usage](#30-UFO-Attention-Usage)

    - [31. ACmix Attention Usage](#31-Acmix-Attention-Usage)
  
    - [32. MobileViTv2 Attention Usage](#32-MobileViTv2-Attention-Usage)

    - [33. DAT Attention Usage](#33-DAT-Attention-Usage)

    - [34. CrossFormer Attention Usage](#34-CrossFormer-Attention-Usage)

    - [35. MOATransformer Attention Usage](#35-MOATransformer-Attention-Usage)

    - [36. CrissCrossAttention Attention Usage](#36-CrissCrossAttention-Attention-Usage)

    - [37. Axial_attention Attention Usage](#37-Axial_attention-Attention-Usage)

    - [38. Frequency Channel Attention Usage](#38-Frequency-Channel-Attention-Usage)

    - [39. Attention Augmented Convolutional Networks Usage](#39-Attention-Augmented-Convolutional-Networks-Usage)

    - [40. Global Context Attention Usage](#40-Global-Context-Attention-Usage)

    - [41. Linear Context Transform Attention Usage](#41-Linear-Context-Transform-Attention-Usage)

    - [42. Gated Channel Transformation Usage](#42-Gated-Channel-Transformation-Usage)

    - [43. Gaussian Context Attention Usage](#43-Gaussian-Context-Attention-Usage)


- [MLP Series](#mlp-series)

    - [1. RepMLP Usage](#1-RepMLP-Usage)

    - [2. MLP-Mixer Usage](#2-MLP-Mixer-Usage)

    - [3. ResMLP Usage](#3-ResMLP-Usage)

    - [4. gMLP Usage](#4-gMLP-Usage)

    - [5. sMLP Usage](#5-sMLP-Usage)

    - [6. vip-mlp Usage](#6-vip-mlp-Usage)

***


### 1. External Attention Usage
#### 1.1. Paper
["Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks"](https://arxiv.org/abs/2105.02358)

#### 1.2. Overview
![](.//AttentionMechanism/model/img/External_Attention.png)

#### 1.3. Usage Code
```python
from AttentionMechanism.model.attention.ExternalAttention import ExternalAttention
import torch

input=torch.randn(50,49,512)
ea = ExternalAttention(d_model=512,S=8)
output=ea(input)
print(output.shape)
```

***


### 2. Self Attention Usage
#### 2.1. Paper
["Attention Is All You Need"](https://arxiv.org/pdf/1706.03762.pdf)

#### 1.2. Overview
![](.//AttentionMechanism/model/img/SA.png)

#### 1.3. Usage Code
```python
from AttentionMechanism.model.attention.SelfAttention import ScaledDotProductAttention
import torch

input=torch.randn(50,49,512)
sa = ScaledDotProductAttention(d_model=512, d_k=512, d_v=512, h=8)
output=sa(input,input,input)
print(output.shape)
```

***

### 3. Simplified Self Attention Usage
#### 3.1. Paper
[SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks (ICML 2021)](https://proceedings.mlr.press/v139/yang21o/yang21o.pdf)

#### 3.2. Overview
![](.//AttentionMechanism/model/img/SimAttention.png)

#### 3.3. Usage Code
```python
from AttentionMechanism.model.attention.SimplifiedSelfAttention import SimplifiedScaledDotProductAttention
import torch

input=torch.randn(50,49,512)
ssa = SimplifiedScaledDotProductAttention(d_model=512, h=8)
output=ssa(input,input,input)
print(output.shape)

```

***

### 4. Squeeze-and-Excitation Attention Usage
#### 4.1. Paper
["Squeeze-and-Excitation Networks"](https://arxiv.org/abs/1709.01507)

#### 4.2. Overview
![](.//AttentionMechanism/model/img/SE.png)

#### 4.3. Usage Code
```python
from AttentionMechanism.model.attention.SEAttention import SEAttention
import torch

input=torch.randn(50,512,7,7)
se = SEAttention(channel=512,reduction=8)
output=se(input)
print(output.shape)

```

***

### 5. SK Attention Usage
#### 5.1. Paper
["Selective Kernel Networks"](https://arxiv.org/pdf/1903.06586.pdf)

#### 5.2. Overview
![](.//AttentionMechanism/model/img/SK.png)

#### 5.3. Usage Code
```python
from AttentionMechanism.model.attention.SKAttention import SKAttention
import torch

input=torch.randn(50,512,7,7)
se = SKAttention(channel=512,reduction=8)
output=se(input)
print(output.shape)

```
***

### 6. CBAM Attention Usage
#### 6.1. Paper
["CBAM: Convolutional Block Attention Module"](https://openaccess.thecvf.com/content_ECCV_2018/papers/Sanghyun_Woo_Convolutional_Block_Attention_ECCV_2018_paper.pdf)

#### 6.2. Overview
![](.//AttentionMechanism/model/img/CBAM1.png)

![](.//AttentionMechanism/model/img/CBAM2.png)

#### 6.3. Usage Code
```python
from AttentionMechanism.model.attention.CBAM import CBAMBlock
import torch

input=torch.randn(50,512,7,7)
kernel_size=input.shape[2]
cbam = CBAMBlock(channel=512,reduction=16,kernel_size=kernel_size)
output=cbam(input)
print(output.shape)

```

***

### 7. BAM Attention Usage
#### 7.1. Paper
["BAM: Bottleneck Attention Module"](https://arxiv.org/pdf/1807.06514.pdf)

#### 7.2. Overview
![](.//AttentionMechanism/model/img/BAM.png)

#### 7.3. Usage Code
```python
from AttentionMechanism.model.attention.BAM import BAMBlock
import torch

input=torch.randn(50,512,7,7)
bam = BAMBlock(channel=512,reduction=16,dia_val=2)
output=bam(input)
print(output.shape)

```

***

### 8. ECA Attention Usage
#### 8.1. Paper
["ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks"](https://arxiv.org/pdf/1910.03151.pdf)

#### 8.2. Overview
![](.//AttentionMechanism/model/img/ECA.png)

#### 8.3. Usage Code
```python
from AttentionMechanism.model.attention.ECAAttention import ECAAttention
import torch

input=torch.randn(50,512,7,7)
eca = ECAAttention(kernel_size=3)
output=eca(input)
print(output.shape)

```

***

### 9. DANet Attention Usage
#### 9.1. Paper
["Dual Attention Network for Scene Segmentation"](https://arxiv.org/pdf/1809.02983.pdf)

#### 9.2. Overview
![](.//AttentionMechanism/model/img/danet.png)

#### 9.3. Usage Code
```python
from AttentionMechanism.model.attention.DANet import DAModule
import torch

input=torch.randn(50,512,7,7)
danet=DAModule(d_model=512,kernel_size=3,H=7,W=7)
print(danet(input).shape)

```

***

### 10. Pyramid Split Attention Usage

#### 10.1. Paper
["EPSANet: An Efficient Pyramid Split Attention Block on Convolutional Neural Network"](https://arxiv.org/pdf/2105.14447.pdf)

#### 10.2. Overview
![](.//AttentionMechanism/model/img/psa.png)

#### 10.3. Usage Code
```python
from AttentionMechanism.model.attention.PSA import PSA
import torch

input=torch.randn(50,512,7,7)
psa = PSA(channel=512,reduction=8)
output=psa(input)
print(output.shape)

```

***


### 11. Efficient Multi-Head Self-Attention Usage

#### 11.1. Paper
["ResT: An Efficient Transformer for Visual Recognition"](https://arxiv.org/abs/2105.13677)

#### 11.2. Overview
![](.//AttentionMechanism/model/img/EMSA.png)

#### 11.3. Usage Code
```python

from AttentionMechanism.model.attention.EMSA import EMSA
import torch
from torch import nn
from torch.nn import functional as F

input=torch.randn(50,64,512)
emsa = EMSA(d_model=512, d_k=512, d_v=512, h=8,H=8,W=8,ratio=2,apply_transform=True)
output=emsa(input,input,input)
print(output.shape)
    
```

***


### 12. Shuffle Attention Usage

#### 12.1. Paper
["SA-NET: SHUFFLE ATTENTION FOR DEEP CONVOLUTIONAL NEURAL NETWORKS"](https://arxiv.org/pdf/2102.00240.pdf)

#### 12.2. Overview
![](.//AttentionMechanism/model/img/ShuffleAttention.jpg)

#### 12.3. Usage Code
```python

from AttentionMechanism.model.attention.ShuffleAttention import ShuffleAttention
import torch
from torch import nn
from torch.nn import functional as F


input=torch.randn(50,512,7,7)
se = ShuffleAttention(channel=512,G=8)
output=se(input)
print(output.shape)
 
```
***


### 13. MUSE Attention Usage

#### 13.1. Paper
["MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning"](https://arxiv.org/abs/1911.09483)

#### 13.2. Overview
![](.//AttentionMechanism/model/img/MUSE.png)

#### 13.3. Usage Code
```python
from AttentionMechanism.model.attention.MUSEAttention import MUSEAttention
import torch
from torch import nn
from torch.nn import functional as F

input=torch.randn(50,49,512)
sa = MUSEAttention(d_model=512, d_k=512, d_v=512, h=8)
output=sa(input,input,input)
print(output.shape)

```

***


### 14. SGE Attention Usage

#### 14.1. Paper
[Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks](https://arxiv.org/pdf/1905.09646.pdf)

#### 14.2. Overview
![](.//AttentionMechanism/model/img/SGE.png)

#### 14.3. Usage Code
```python
from AttentionMechanism.model.attention.SGE import SpatialGroupEnhance
import torch
from torch import nn
from torch.nn import functional as F

input=torch.randn(50,512,7,7)
sge = SpatialGroupEnhance(groups=8)
output=sge(input)
print(output.shape)

```

***


### 15. A2 Attention Usage

#### 15.1. Paper
[A2-Nets: Double Attention Networks](https://arxiv.org/pdf/1810.11579.pdf)

#### 15.2. Overview
![](.//AttentionMechanism/model/img/A2.png)

#### 15.3. Usage Code
```python
from AttentionMechanism.model.attention.A2Atttention import DoubleAttention
import torch
from torch import nn
from torch.nn import functional as F

input=torch.randn(50,512,7,7)
a2 = DoubleAttention(512,128,128,True)
output=a2(input)
print(output.shape)

```


### 16. AFT Attention Usage

#### 16.1. Paper
[An Attention Free Transformer](https://arxiv.org/pdf/2105.14103v1.pdf)

#### 16.2. Overview
![](.//AttentionMechanism/model/img/AFT.jpg)

#### 16.3. Usage Code
```python
from AttentionMechanism.model.attention.AFT import AFT_FULL
import torch
from torch import nn
from torch.nn import functional as F

input=torch.randn(50,49,512)
aft_full = AFT_FULL(d_model=512, n=49)
output=aft_full(input)
print(output.shape)

```
***


### 17. Outlook Attention Usage

#### 17.1. Paper


[VOLO: Vision Outlooker for Visual Recognition"](https://arxiv.org/abs/2106.13112)


#### 17.2. Overview
![](.//AttentionMechanism/model/img/OutlookAttention.png)

#### 17.3. Usage Code
```python
from AttentionMechanism.model.attention.OutlookAttention import OutlookAttention
import torch
from torch import nn
from torch.nn import functional as F

input=torch.randn(50,28,28,512)
outlook = OutlookAttention(dim=512)
output=outlook(input)
print(output.shape)

```
***


### 18. ViP Attention Usage

#### 18.1. Paper

[Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition"](https://arxiv.org/abs/2106.12368)

#### 18.2. Overview
![](.//AttentionMechanism/model/img/ViP.png)

#### 18.3. Usage Code
```python

from AttentionMechanism.model.attention.ViP import WeightedPermuteMLP
import torch
from torch import nn
from torch.nn import functional as F

input=torch.randn(64,8,8,512)
seg_dim=8
vip=WeightedPermuteMLP(512,seg_dim)
out=vip(input)
print(out.shape)

```
***


### 19. CoAtNet Attention Usage

#### 19.1. Paper

[CoAtNet: Marrying Convolution and Attention for All Data Sizes"](https://arxiv.org/abs/2106.04803) 

#### 19.2. Overview
None

#### 19.3. Usage Code
```python

from AttentionMechanism.model.attention.CoAtNet import CoAtNet
import torch
from torch import nn
from torch.nn import functional as F

input=torch.randn(1,3,224,224)
mbconv=CoAtNet(in_ch=3,image_size=224)
out=mbconv(input)
print(out.shape)

```


***

### 20. HaloNet Attention Usage

#### 20.1. Paper

[Scaling Local Self-Attention for Parameter Efficient Visual Backbones"](https://arxiv.org/pdf/2103.12731.pdf) 


#### 20.2. Overview

![](.//AttentionMechanism/model/img/HaloNet.png)

#### 20.3. Usage Code
```python

from AttentionMechanism.model.attention.HaloAttention import HaloAttention
import torch
from torch import nn
from torch.nn import functional as F

input=torch.randn(1,512,8,8)
halo = HaloAttention(dim=512,
    block_size=2,
    halo_size=1,)
output=halo(input)
print(output.shape)

```
***


### 21. Polarized Self-Attention Usage

#### 21.1. Paper

[Polarized Self-Attention: Towards High-quality Pixel-wise Regression"](https://arxiv.org/abs/2107.00782)  

#### 21.2. Overview

![](.//AttentionMechanism/model/img/PoSA.png)

#### 21.3. Usage Code
```python

from AttentionMechanism.model.attention.PolarizedSelfAttention import ParallelPolarizedSelfAttention,SequentialPolarizedSelfAttention
import torch
from torch import nn
from torch.nn import functional as F

input=torch.randn(1,512,7,7)
psa = SequentialPolarizedSelfAttention(channel=512)
output=psa(input)
print(output.shape)

```
***


### 22. CoTAttention Usage

#### 22.1. Paper

[Contextual Transformer Networks for Visual Recognition---arXiv 2021.07.26](https://arxiv.org/abs/2107.12292) 

#### 22.2. Overview

![](.//AttentionMechanism/model/img/CoT.png)

#### 22.3. Usage Code
```python

from AttentionMechanism.model.attention.CoTAttention import CoTAttention
import torch
from torch import nn
from torch.nn import functional as F

input=torch.randn(50,512,7,7)
cot = CoTAttention(dim=512,kernel_size=3)
output=cot(input)
print(output.shape)

```

***


### 23. Residual Attention Usage

#### 23.1. Paper

[Residual Attention: A Simple but Effective Method for Multi-Label Recognition---ICCV2021](https://arxiv.org/abs/2108.02456) 


#### 23.2. Overview

![](.//AttentionMechanism/model/img/ResAtt.png)

#### 23.3. Usage Code
```python

from AttentionMechanism.model.attention.ResidualAttention import ResidualAttention
import torch
from torch import nn
from torch.nn import functional as F

input=torch.randn(50,512,7,7)
resatt = ResidualAttention(channel=512,num_class=1000,la=0.2)
output=resatt(input)
print(output.shape)

```

***


### 24. S2 Attention Usage

#### 24.1. Paper

[S²-MLPv2: Improved Spatial-Shift MLP Architecture for Vision---arXiv 2021.08.02](https://arxiv.org/abs/2108.01072) 

#### 24.2. Overview

![](.//AttentionMechanism/model/img/S2Attention.png)

#### 24.3. Usage Code
```python
from AttentionMechanism.model.attention.S2Attention import S2Attention
import torch
from torch import nn
from torch.nn import functional as F

input=torch.randn(50,512,7,7)
s2att = S2Attention(channels=512)
output=s2att(input)
print(output.shape)

```

***


### 25. GFNet Attention Usage

#### 25.1. Paper

[Global Filter Networks for Image Classification---arXiv 2021.07.01](https://arxiv.org/abs/2107.00645) 


#### 25.2. Overview

![](.//AttentionMechanism/model/img/GFNet.jpg)

#### 25.3. Usage Code - Implemented by [Wenliang Zhao (Author)](https://scholar.google.com/citations?user=lyPWvuEAAAAJ&hl=en)

```python
from AttentionMechanism.model.attention.gfnet import GFNet
import torch
from torch import nn
from torch.nn import functional as F

x = torch.randn(1, 3, 224, 224)
gfnet = GFNet(embed_dim=384, img_size=224, patch_size=16, num_classes=1000)
out = gfnet(x)
print(out.shape)

```

***


### 26. TripletAttention Usage

#### 26.1. Paper

[Rotate to Attend: Convolutional Triplet Attention Module---CVPR 2021](https://arxiv.org/abs/2010.03045) 

#### 26.2. Overview

![](.//AttentionMechanism/model/img/triplet.png)

#### 26.3. Usage Code - Implemented by [digantamisra98](https://github.com/digantamisra98)

```python
from AttentionMechanism.model.attention.TripletAttention import TripletAttention
import torch
from torch import nn
from torch.nn import functional as F
input=torch.randn(50,512,7,7)
triplet = TripletAttention()
output=triplet(input)
print(output.shape)
```
***


### 27. Coordinate Attention Usage

#### 27.1. Paper

[Coordinate Attention for Efficient Mobile Network Design---CVPR 2021](https://arxiv.org/abs/2103.02907)


#### 27.2. Overview

![](.//AttentionMechanism/model/img/CoordAttention.png)

#### 27.3. Usage Code - Implemented by [Andrew-Qibin](https://github.com/Andrew-Qibin)

```python
from AttentionMechanism.model.attention.CoordAttention import CoordAtt
import torch
from torch import nn
from torch.nn import functional as F

inp=torch.rand([2, 96, 56, 56])
inp_dim, oup_dim = 96, 96
reduction=32

coord_attention = CoordAtt(inp_dim, oup_dim, reduction=reduction)
output=coord_attention(inp)
print(output.shape)
```

***


### 28. MobileViT Attention Usage

#### 28.1. Paper

[MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer---ArXiv 2021.10.05](https://arxiv.org/abs/2103.02907)


#### 28.2. Overview

![](.//AttentionMechanism/model/img/MobileViTAttention.png)

#### 28.3. Usage Code

```python
from AttentionMechanism.model.attention.MobileViTAttention import MobileViTAttention
import torch
from torch import nn
from torch.nn import functional as F

if __name__ == '__main__':
    m=MobileViTAttention()
    input=torch.randn(1,3,49,49)
    output=m(input)
    print(output.shape)  #output:(1,3,49,49)
    
```

***


### 29. ParNet Attention Usage

#### 29.1. Paper

[Non-deep Networks---ArXiv 2021.10.20](https://arxiv.org/abs/2110.07641)


#### 29.2. Overview

![](.//AttentionMechanism/model/img/ParNet.png)

#### 29.3. Usage Code

```python
from AttentionMechanism.model.attention.ParNetAttention import *
import torch
from torch import nn
from torch.nn import functional as F

if __name__ == '__main__':
    input=torch.randn(50,512,7,7)
    pna = ParNetAttention(channel=512)
    output=pna(input)
    print(output.shape) #50,512,7,7
    
```

***


### 30. UFO Attention Usage

#### 30.1. Paper

[UFO-ViT: High Performance Linear Vision Transformer without Softmax---ArXiv 2021.09.29](https://arxiv.org/abs/2110.07641)


#### 30.2. Overview

![](.//AttentionMechanism/model/img/UFO.png)

#### 30.3. Usage Code

```python
from AttentionMechanism.model.attention.UFOAttention import *
import torch
from torch import nn
from torch.nn import functional as F

if __name__ == '__main__':
    input=torch.randn(50,49,512)
    ufo = UFOAttention(d_model=512, d_k=512, d_v=512, h=8)
    output=ufo(input,input,input)
    print(output.shape) #[50, 49, 512]
    
```

***

### 31. ACmix Attention Usage

#### 31.1. Paper

[On the Integration of Self-Attention and Convolution](https://arxiv.org/pdf/2111.14556.pdf)

#### 31.2. Usage Code

```python
from AttentionMechanism.model.attention.ACmix import ACmix
import torch

if __name__ == '__main__':
    input=torch.randn(50,256,7,7)
    acmix = ACmix(in_planes=256, out_planes=256)
    output=acmix(input)
    print(output.shape)
    
```
***

### 32. MobileViTv2 Attention Usage

#### 32.1. Paper

[Separable Self-attention for Mobile Vision Transformers---ArXiv 2022.06.06](https://arxiv.org/abs/2206.02680)


#### 32.2. Overview

![](.//AttentionMechanism/model/img/MobileViTv2.png)

#### 32.3. Usage Code

```python
from AttentionMechanism.model.attention.MobileViTv2Attention import MobileViTv2Attention
import torch
from torch import nn
from torch.nn import functional as F

if __name__ == '__main__':
    input=torch.randn(50,49,512)
    sa = MobileViTv2Attention(d_model=512)
    output=sa(input)
    print(output.shape)
    
```
***

### 33. DAT Attention Usage

#### 33.1. Paper

[Vision Transformer with Deformable Attention---CVPR2022](https://arxiv.org/abs/2201.00520)

#### 33.2. Usage Code

```python
from AttentionMechanism.model.attention.DAT import DAT
import torch

if __name__ == '__main__':
    input=torch.randn(1,3,224,224)
    model = DAT(
        img_size=224,
        patch_size=4,
        num_classes=1000,
        expansion=4,
        dim_stem=96,
        dims=[96, 192, 384, 768],
        depths=[2, 2, 6, 2],
        stage_spec=[['L', 'S'], ['L', 'S'], ['L', 'D', 'L', 'D', 'L', 'D'], ['L', 'D']],
        heads=[3, 6, 12, 24],
        window_sizes=[7, 7, 7, 7] ,
        groups=[-1, -1, 3, 6],
        use_pes=[False, False, True, True],
        dwc_pes=[False, False, False, False],
        strides=[-1, -1, 1, 1],
        sr_ratios=[-1, -1, -1, -1],
        offset_range_factor=[-1, -1, 2, 2],
        no_offs=[False, False, False, False],
        fixed_pes=[False, False, False, False],
        use_dwc_mlps=[False, False, False, False],
        use_conv_patches=False,
        drop_rate=0.0,
        attn_drop_rate=0.0,
        drop_path_rate=0.2,
    )
    output=model(input)
    print(output[0].shape)
    
```
***

### 34. CrossFormer Attention Usage

#### 34.1. Paper

[CROSSFORMER: A VERSATILE VISION TRANSFORMER HINGING ON CROSS-SCALE ATTENTION---ICLR 2022](https://arxiv.org/pdf/2108.00154.pdf)

#### 34.2. Usage Code

```python
from AttentionMechanism.model.attention.Crossformer import CrossFormer
import torch

if __name__ == '__main__':
    input=torch.randn(1,3,224,224)
    model = CrossFormer(img_size=224,
        patch_size=[4, 8, 16, 32],
        in_chans= 3,
        num_classes=1000,
        embed_dim=48,
        depths=[2, 2, 6, 2],
        num_heads=[3, 6, 12, 24],
        group_size=[7, 7, 7, 7],
        mlp_ratio=4.,
        qkv_bias=True,
        qk_scale=None,
        drop_rate=0.0,
        drop_path_rate=0.1,
        ape=False,
        patch_norm=True,
        use_checkpoint=False,
        merge_size=[[2, 4], [2,4], [2, 4]]
    )
    output=model(input)
    print(output.shape)
    
```
***

### 35. MOATransformer Attention Usage

#### 35.1. Paper

[Aggregating Global Features into Local Vision Transformer](https://arxiv.org/abs/2201.12903)

#### 35.2. Usage Code

```python
from AttentionMechanism.model.attention.MOATransformer import MOATransformer
import torch

if __name__ == '__main__':
    input=torch.randn(1,3,224,224)
    model = MOATransformer(
        img_size=224,
        patch_size=4,
        in_chans=3,
        num_classes=1000,
        embed_dim=96,
        depths=[2, 2, 6],
        num_heads=[3, 6, 12],
        window_size=14,
        mlp_ratio=4.,
        qkv_bias=True,
        qk_scale=None,
        drop_rate=0.0,
        drop_path_rate=0.1,
        ape=False,
        patch_norm=True,
        use_checkpoint=False
    )
    output=model(input)
    print(output.shape)
    
```
***

### 36. CrissCrossAttention Attention Usage

#### 36.1. Paper

[CCNet: Criss-Cross Attention for Semantic Segmentation](https://arxiv.org/abs/1811.11721)

#### 36.2. Usage Code

```python
from AttentionMechanism.model.attention.CrissCrossAttention import CrissCrossAttention
import torch

if __name__ == '__main__':
    input=torch.randn(3, 64, 7, 7)
    model = CrissCrossAttention(64)
    outputs = model(input)
    print(outputs.shape)
    
```
***

### 37. Axial_attention Attention Usage

#### 37.1. Paper

[Axial Attention in Multidimensional Transformers](https://arxiv.org/abs/1912.12180)

#### 37.2. Usage Code

```python
from AttentionMechanism.model.attention.Axial_attention import AxialImageTransformer
import torch

if __name__ == '__main__':
    input=torch.randn(3, 128, 7, 7)
    model = AxialImageTransformer(
        dim = 128,
        depth = 12,
        reversible = True
    )
    outputs = model(input)
    print(outputs.shape)
    
```
***

### 38. Frequency Channel Attention Usage

#### 38.1. Paper

[FcaNet: Frequency Channel Attention Networks (ICCV 2021)](https://arxiv.org/abs/2012.11879)

#### 38.2. Overview

![](.//AttentionMechanism/model/img/FCANet.png)

#### 38.3. Usage Code

```python
from AttentionMechanism.model.attention.FCA import MultiSpectralAttentionLayer
import torch

if __name__ == "__main__":
    input = torch.randn(32, 128, 64, 64) # (b, c, h, w)
    fca_layer = MultiSpectralAttentionLayer(channel = 128, dct_h = 64, dct_w = 64, reduction = 16, freq_sel_method = 'top16')
    output = fca_layer(input)
    print(output.shape)
    
```
***

### 39. Attention Augmented Convolutional Networks Usage

#### 39.1. Paper

[Attention Augmented Convolutional Networks (ICCV 2019)](https://arxiv.org/abs/1904.09925)

#### 39.2. Overview

![](.//AttentionMechanism/model/img/AAAttention.png)

#### 39.3. Usage Code

```python
from AttentionMechanism.model.attention.AAAttention import AugmentedConv
import torch

if __name__ == "__main__":
    input = torch.randn((16, 3, 32, 32))
    augmented_conv = AugmentedConv(in_channels=3, out_channels=64, kernel_size=3, dk=40, dv=4, Nh=4, relative=True, stride=2, shape=16)
    output = augmented_conv(input)
    print(output.shape)
    
```
***

### 40. Global Context Attention Usage

#### 40.1. Paper

[GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond (ICCVW 2019 Best Paper)](https://arxiv.org/abs/1904.11492)

[Global Context Networks (TPAMI 2020)](https://arxiv.org/abs/2012.13375)

#### 40.2. Overview

![](.//AttentionMechanism/model/img/GCNet.png)

#### 40.3. Usage Code

```python
from AttentionMechanism.model.attention.GCAttention import GCModule
import torch

if __name__ == "__main__":
    input = torch.randn(16, 64, 32, 32)
    gc_layer = GCModule(64)
    output = gc_layer(input)
    print(output.shape)
    
```
***

### 41. Linear Context Transform Attention Usage

#### 41.1. Paper

[Linear Context Transform Block (AAAI 2020)](https://arxiv.org/pdf/1909.03834v2)

#### 41.2. Overview

![](.//AttentionMechanism/model/img/LCTAttention.png)

#### 41.3. Usage Code

```python
from AttentionMechanism.model.attention.LCTAttention import LCT
import torch

if __name__ == "__main__":
    x = torch.randn(16, 64, 32, 32)
    attn = LCT(64, 8)
    y = attn(x)
    print(y.shape)
    
```
***

### 42. Gated Channel Transformation Usage

#### 42.1. Paper

[Gated Channel Transformation for Visual Recognition (CVPR 2020)](https://openaccess.thecvf.com/content_CVPR_2020/papers/Yang_Gated_Channel_Transformation_for_Visual_Recognition_CVPR_2020_paper.pdf)

#### 42.2. Overview

![](.//AttentionMechanism/model/img/GCT.png)

#### 42.3. Usage Code

```python
from AttentionMechanism.model.attention.GCTAttention import GCT
import torch

if __name__ == "__main__":
    input = torch.randn(16, 64, 32, 32)
    gct_layer = GCT(64)
    output = gct_layer(input)
    print(output.shape)
    
```
***

### 43. Gaussian Context Attention Usage

#### 43.1. Paper

[Gaussian Context Transformer (CVPR 2021)](https://openaccess.thecvf.com//content/CVPR2021/papers/Ruan_Gaussian_Context_Transformer_CVPR_2021_paper.pdf)

#### 43.2. Overview

![](.//AttentionMechanism/model/img/GaussianCA.png)

#### 43.3. Usage Code

```python
from AttentionMechanism.model.attention.GaussianAttention import GCA
import torch

if __name__ == "__main__":
    input = torch.randn(16, 64, 32, 32)
    gca_layer = GCA(64)
    output = gca_layer(input)
    print(output.shape)
    
```

***

# Acknowledgements
During the development of this project, the following open-source projects provided significant help and support. We hereby express our sincere gratitude:

- [**https://github.com/xmu-xiaoma666/External-Attention-pytorch**](https://github.com/xmu-xiaoma666/External-Attention-pytorch)

- [**https://github.com/cmhungsteve/Awesome-Transformer-Attention**](https://github.com/cmhungsteve/Awesome-Transformer-Attention)

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/gongyan1/Attention-Mechanism-Pytorch",
    "name": "AttentionMechanism",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7.0",
    "maintainer_email": null,
    "keywords": "Attention, Machine Learning, Deep Learning, Neural Networks, Pytorch",
    "author": null,
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/10/67/a6cdd15c47abec71f2cf2f818adad2c82ef68a3f60307e6fc46c003f1d1c/attentionmechanism-1.0.2.tar.gz",
    "platform": null,
    "description": "# Attention-Mechanism-Pytorch\nThis repository contains an implementation of many attention mechanism models.\n\n# Change Log\n- [x] Published Initial Attention Models, 2024-8-12.\n\n\n# \u76ee\u5f55\n\n- [Attention Series](#attention-series)\n    - [1. External Attention Usage](#1-external-attention-usage)\n\n    - [2. Self Attention Usage](#2-self-attention-usage)\n\n    - [3. Simplified Self Attention Usage](#3-simplified-self-attention-usage)\n\n    - [4. Squeeze-and-Excitation Attention Usage](#4-squeeze-and-excitation-attention-usage)\n\n    - [5. SK Attention Usage](#5-sk-attention-usage)\n\n    - [6. CBAM Attention Usage](#6-cbam-attention-usage)\n\n    - [7. BAM Attention Usage](#7-bam-attention-usage)\n    \n    - [8. ECA Attention Usage](#8-eca-attention-usage)\n\n    - [9. DANet Attention Usage](#9-danet-attention-usage)\n\n    - [10. Pyramid Split Attention (PSA) Usage](#10-Pyramid-Split-Attention-Usage)\n\n    - [11. Efficient Multi-Head Self-Attention(EMSA) Usage](#11-Efficient-Multi-Head-Self-Attention-Usage)\n\n    - [12. Shuffle Attention Usage](#12-Shuffle-Attention-Usage)\n    \n    - [13. MUSE Attention Usage](#13-MUSE-Attention-Usage)\n  \n    - [14. SGE Attention Usage](#14-SGE-Attention-Usage)\n\n    - [15. A2 Attention Usage](#15-A2-Attention-Usage)\n\n    - [16. AFT Attention Usage](#16-AFT-Attention-Usage)\n\n    - [17. Outlook Attention Usage](#17-Outlook-Attention-Usage)\n\n    - [18. ViP Attention Usage](#18-ViP-Attention-Usage)\n\n    - [19. CoAtNet Attention Usage](#19-CoAtNet-Attention-Usage)\n\n    - [20. HaloNet Attention Usage](#20-HaloNet-Attention-Usage)\n\n    - [21. Polarized Self-Attention Usage](#21-Polarized-Self-Attention-Usage)\n\n    - [22. CoTAttention Usage](#22-CoTAttention-Usage)\n\n    - [23. Residual Attention Usage](#23-Residual-Attention-Usage)\n  \n    - [24. S2 Attention Usage](#24-S2-Attention-Usage)\n\n    - [25. GFNet Attention Usage](#25-GFNet-Attention-Usage)\n\n    - [26. Triplet Attention Usage](#26-TripletAttention-Usage)\n\n    - [27. Coordinate Attention Usage](#27-Coordinate-Attention-Usage)\n\n    - [28. MobileViT Attention Usage](#28-MobileViT-Attention-Usage)\n\n    - [29. ParNet Attention Usage](#29-ParNet-Attention-Usage)\n\n    - [30. UFO Attention Usage](#30-UFO-Attention-Usage)\n\n    - [31. ACmix Attention Usage](#31-Acmix-Attention-Usage)\n  \n    - [32. MobileViTv2 Attention Usage](#32-MobileViTv2-Attention-Usage)\n\n    - [33. DAT Attention Usage](#33-DAT-Attention-Usage)\n\n    - [34. CrossFormer Attention Usage](#34-CrossFormer-Attention-Usage)\n\n    - [35. MOATransformer Attention Usage](#35-MOATransformer-Attention-Usage)\n\n    - [36. CrissCrossAttention Attention Usage](#36-CrissCrossAttention-Attention-Usage)\n\n    - [37. Axial_attention Attention Usage](#37-Axial_attention-Attention-Usage)\n\n    - [38. Frequency Channel Attention Usage](#38-Frequency-Channel-Attention-Usage)\n\n    - [39. Attention Augmented Convolutional Networks Usage](#39-Attention-Augmented-Convolutional-Networks-Usage)\n\n    - [40. Global Context Attention Usage](#40-Global-Context-Attention-Usage)\n\n    - [41. Linear Context Transform Attention Usage](#41-Linear-Context-Transform-Attention-Usage)\n\n    - [42. Gated Channel Transformation Usage](#42-Gated-Channel-Transformation-Usage)\n\n    - [43. Gaussian Context Attention Usage](#43-Gaussian-Context-Attention-Usage)\n\n\n- [MLP Series](#mlp-series)\n\n    - [1. RepMLP Usage](#1-RepMLP-Usage)\n\n    - [2. MLP-Mixer Usage](#2-MLP-Mixer-Usage)\n\n    - [3. ResMLP Usage](#3-ResMLP-Usage)\n\n    - [4. gMLP Usage](#4-gMLP-Usage)\n\n    - [5. sMLP Usage](#5-sMLP-Usage)\n\n    - [6. vip-mlp Usage](#6-vip-mlp-Usage)\n\n***\n\n\n### 1. External Attention Usage\n#### 1.1. Paper\n[\"Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks\"](https://arxiv.org/abs/2105.02358)\n\n#### 1.2. Overview\n![](.//AttentionMechanism/model/img/External_Attention.png)\n\n#### 1.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.ExternalAttention import ExternalAttention\nimport torch\n\ninput=torch.randn(50,49,512)\nea = ExternalAttention(d_model=512,S=8)\noutput=ea(input)\nprint(output.shape)\n```\n\n***\n\n\n### 2. Self Attention Usage\n#### 2.1. Paper\n[\"Attention Is All You Need\"](https://arxiv.org/pdf/1706.03762.pdf)\n\n#### 1.2. Overview\n![](.//AttentionMechanism/model/img/SA.png)\n\n#### 1.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.SelfAttention import ScaledDotProductAttention\nimport torch\n\ninput=torch.randn(50,49,512)\nsa = ScaledDotProductAttention(d_model=512, d_k=512, d_v=512, h=8)\noutput=sa(input,input,input)\nprint(output.shape)\n```\n\n***\n\n### 3. Simplified Self Attention Usage\n#### 3.1. Paper\n[SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks (ICML 2021)](https://proceedings.mlr.press/v139/yang21o/yang21o.pdf)\n\n#### 3.2. Overview\n![](.//AttentionMechanism/model/img/SimAttention.png)\n\n#### 3.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.SimplifiedSelfAttention import SimplifiedScaledDotProductAttention\nimport torch\n\ninput=torch.randn(50,49,512)\nssa = SimplifiedScaledDotProductAttention(d_model=512, h=8)\noutput=ssa(input,input,input)\nprint(output.shape)\n\n```\n\n***\n\n### 4. Squeeze-and-Excitation Attention Usage\n#### 4.1. Paper\n[\"Squeeze-and-Excitation Networks\"](https://arxiv.org/abs/1709.01507)\n\n#### 4.2. Overview\n![](.//AttentionMechanism/model/img/SE.png)\n\n#### 4.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.SEAttention import SEAttention\nimport torch\n\ninput=torch.randn(50,512,7,7)\nse = SEAttention(channel=512,reduction=8)\noutput=se(input)\nprint(output.shape)\n\n```\n\n***\n\n### 5. SK Attention Usage\n#### 5.1. Paper\n[\"Selective Kernel Networks\"](https://arxiv.org/pdf/1903.06586.pdf)\n\n#### 5.2. Overview\n![](.//AttentionMechanism/model/img/SK.png)\n\n#### 5.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.SKAttention import SKAttention\nimport torch\n\ninput=torch.randn(50,512,7,7)\nse = SKAttention(channel=512,reduction=8)\noutput=se(input)\nprint(output.shape)\n\n```\n***\n\n### 6. CBAM Attention Usage\n#### 6.1. Paper\n[\"CBAM: Convolutional Block Attention Module\"](https://openaccess.thecvf.com/content_ECCV_2018/papers/Sanghyun_Woo_Convolutional_Block_Attention_ECCV_2018_paper.pdf)\n\n#### 6.2. Overview\n![](.//AttentionMechanism/model/img/CBAM1.png)\n\n![](.//AttentionMechanism/model/img/CBAM2.png)\n\n#### 6.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.CBAM import CBAMBlock\nimport torch\n\ninput=torch.randn(50,512,7,7)\nkernel_size=input.shape[2]\ncbam = CBAMBlock(channel=512,reduction=16,kernel_size=kernel_size)\noutput=cbam(input)\nprint(output.shape)\n\n```\n\n***\n\n### 7. BAM Attention Usage\n#### 7.1. Paper\n[\"BAM: Bottleneck Attention Module\"](https://arxiv.org/pdf/1807.06514.pdf)\n\n#### 7.2. Overview\n![](.//AttentionMechanism/model/img/BAM.png)\n\n#### 7.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.BAM import BAMBlock\nimport torch\n\ninput=torch.randn(50,512,7,7)\nbam = BAMBlock(channel=512,reduction=16,dia_val=2)\noutput=bam(input)\nprint(output.shape)\n\n```\n\n***\n\n### 8. ECA Attention Usage\n#### 8.1. Paper\n[\"ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks\"](https://arxiv.org/pdf/1910.03151.pdf)\n\n#### 8.2. Overview\n![](.//AttentionMechanism/model/img/ECA.png)\n\n#### 8.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.ECAAttention import ECAAttention\nimport torch\n\ninput=torch.randn(50,512,7,7)\neca = ECAAttention(kernel_size=3)\noutput=eca(input)\nprint(output.shape)\n\n```\n\n***\n\n### 9. DANet Attention Usage\n#### 9.1. Paper\n[\"Dual Attention Network for Scene Segmentation\"](https://arxiv.org/pdf/1809.02983.pdf)\n\n#### 9.2. Overview\n![](.//AttentionMechanism/model/img/danet.png)\n\n#### 9.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.DANet import DAModule\nimport torch\n\ninput=torch.randn(50,512,7,7)\ndanet=DAModule(d_model=512,kernel_size=3,H=7,W=7)\nprint(danet(input).shape)\n\n```\n\n***\n\n### 10. Pyramid Split Attention Usage\n\n#### 10.1. Paper\n[\"EPSANet: An Efficient Pyramid Split Attention Block on Convolutional Neural Network\"](https://arxiv.org/pdf/2105.14447.pdf)\n\n#### 10.2. Overview\n![](.//AttentionMechanism/model/img/psa.png)\n\n#### 10.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.PSA import PSA\nimport torch\n\ninput=torch.randn(50,512,7,7)\npsa = PSA(channel=512,reduction=8)\noutput=psa(input)\nprint(output.shape)\n\n```\n\n***\n\n\n### 11. Efficient Multi-Head Self-Attention Usage\n\n#### 11.1. Paper\n[\"ResT: An Efficient Transformer for Visual Recognition\"](https://arxiv.org/abs/2105.13677)\n\n#### 11.2. Overview\n![](.//AttentionMechanism/model/img/EMSA.png)\n\n#### 11.3. Usage Code\n```python\n\nfrom AttentionMechanism.model.attention.EMSA import EMSA\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninput=torch.randn(50,64,512)\nemsa = EMSA(d_model=512, d_k=512, d_v=512, h=8,H=8,W=8,ratio=2,apply_transform=True)\noutput=emsa(input,input,input)\nprint(output.shape)\n    \n```\n\n***\n\n\n### 12. Shuffle Attention Usage\n\n#### 12.1. Paper\n[\"SA-NET: SHUFFLE ATTENTION FOR DEEP CONVOLUTIONAL NEURAL NETWORKS\"](https://arxiv.org/pdf/2102.00240.pdf)\n\n#### 12.2. Overview\n![](.//AttentionMechanism/model/img/ShuffleAttention.jpg)\n\n#### 12.3. Usage Code\n```python\n\nfrom AttentionMechanism.model.attention.ShuffleAttention import ShuffleAttention\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\n\ninput=torch.randn(50,512,7,7)\nse = ShuffleAttention(channel=512,G=8)\noutput=se(input)\nprint(output.shape)\n \n```\n***\n\n\n### 13. MUSE Attention Usage\n\n#### 13.1. Paper\n[\"MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning\"](https://arxiv.org/abs/1911.09483)\n\n#### 13.2. Overview\n![](.//AttentionMechanism/model/img/MUSE.png)\n\n#### 13.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.MUSEAttention import MUSEAttention\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninput=torch.randn(50,49,512)\nsa = MUSEAttention(d_model=512, d_k=512, d_v=512, h=8)\noutput=sa(input,input,input)\nprint(output.shape)\n\n```\n\n***\n\n\n### 14. SGE Attention Usage\n\n#### 14.1. Paper\n[Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks](https://arxiv.org/pdf/1905.09646.pdf)\n\n#### 14.2. Overview\n![](.//AttentionMechanism/model/img/SGE.png)\n\n#### 14.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.SGE import SpatialGroupEnhance\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninput=torch.randn(50,512,7,7)\nsge = SpatialGroupEnhance(groups=8)\noutput=sge(input)\nprint(output.shape)\n\n```\n\n***\n\n\n### 15. A2 Attention Usage\n\n#### 15.1. Paper\n[A2-Nets: Double Attention Networks](https://arxiv.org/pdf/1810.11579.pdf)\n\n#### 15.2. Overview\n![](.//AttentionMechanism/model/img/A2.png)\n\n#### 15.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.A2Atttention import DoubleAttention\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninput=torch.randn(50,512,7,7)\na2 = DoubleAttention(512,128,128,True)\noutput=a2(input)\nprint(output.shape)\n\n```\n\n\n### 16. AFT Attention Usage\n\n#### 16.1. Paper\n[An Attention Free Transformer](https://arxiv.org/pdf/2105.14103v1.pdf)\n\n#### 16.2. Overview\n![](.//AttentionMechanism/model/img/AFT.jpg)\n\n#### 16.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.AFT import AFT_FULL\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninput=torch.randn(50,49,512)\naft_full = AFT_FULL(d_model=512, n=49)\noutput=aft_full(input)\nprint(output.shape)\n\n```\n***\n\n\n### 17. Outlook Attention Usage\n\n#### 17.1. Paper\n\n\n[VOLO: Vision Outlooker for Visual Recognition\"](https://arxiv.org/abs/2106.13112)\n\n\n#### 17.2. Overview\n![](.//AttentionMechanism/model/img/OutlookAttention.png)\n\n#### 17.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.OutlookAttention import OutlookAttention\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninput=torch.randn(50,28,28,512)\noutlook = OutlookAttention(dim=512)\noutput=outlook(input)\nprint(output.shape)\n\n```\n***\n\n\n### 18. ViP Attention Usage\n\n#### 18.1. Paper\n\n[Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition\"](https://arxiv.org/abs/2106.12368)\n\n#### 18.2. Overview\n![](.//AttentionMechanism/model/img/ViP.png)\n\n#### 18.3. Usage Code\n```python\n\nfrom AttentionMechanism.model.attention.ViP import WeightedPermuteMLP\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninput=torch.randn(64,8,8,512)\nseg_dim=8\nvip=WeightedPermuteMLP(512,seg_dim)\nout=vip(input)\nprint(out.shape)\n\n```\n***\n\n\n### 19. CoAtNet Attention Usage\n\n#### 19.1. Paper\n\n[CoAtNet: Marrying Convolution and Attention for All Data Sizes\"](https://arxiv.org/abs/2106.04803) \n\n#### 19.2. Overview\nNone\n\n#### 19.3. Usage Code\n```python\n\nfrom AttentionMechanism.model.attention.CoAtNet import CoAtNet\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninput=torch.randn(1,3,224,224)\nmbconv=CoAtNet(in_ch=3,image_size=224)\nout=mbconv(input)\nprint(out.shape)\n\n```\n\n\n***\n\n### 20. HaloNet Attention Usage\n\n#### 20.1. Paper\n\n[Scaling Local Self-Attention for Parameter Efficient Visual Backbones\"](https://arxiv.org/pdf/2103.12731.pdf) \n\n\n#### 20.2. Overview\n\n![](.//AttentionMechanism/model/img/HaloNet.png)\n\n#### 20.3. Usage Code\n```python\n\nfrom AttentionMechanism.model.attention.HaloAttention import HaloAttention\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninput=torch.randn(1,512,8,8)\nhalo = HaloAttention(dim=512,\n    block_size=2,\n    halo_size=1,)\noutput=halo(input)\nprint(output.shape)\n\n```\n***\n\n\n### 21. Polarized Self-Attention Usage\n\n#### 21.1. Paper\n\n[Polarized Self-Attention: Towards High-quality Pixel-wise Regression\"](https://arxiv.org/abs/2107.00782)  \n\n#### 21.2. Overview\n\n![](.//AttentionMechanism/model/img/PoSA.png)\n\n#### 21.3. Usage Code\n```python\n\nfrom AttentionMechanism.model.attention.PolarizedSelfAttention import ParallelPolarizedSelfAttention,SequentialPolarizedSelfAttention\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninput=torch.randn(1,512,7,7)\npsa = SequentialPolarizedSelfAttention(channel=512)\noutput=psa(input)\nprint(output.shape)\n\n```\n***\n\n\n### 22. CoTAttention Usage\n\n#### 22.1. Paper\n\n[Contextual Transformer Networks for Visual Recognition---arXiv 2021.07.26](https://arxiv.org/abs/2107.12292) \n\n#### 22.2. Overview\n\n![](.//AttentionMechanism/model/img/CoT.png)\n\n#### 22.3. Usage Code\n```python\n\nfrom AttentionMechanism.model.attention.CoTAttention import CoTAttention\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninput=torch.randn(50,512,7,7)\ncot = CoTAttention(dim=512,kernel_size=3)\noutput=cot(input)\nprint(output.shape)\n\n```\n\n***\n\n\n### 23. Residual Attention Usage\n\n#### 23.1. Paper\n\n[Residual Attention: A Simple but Effective Method for Multi-Label Recognition---ICCV2021](https://arxiv.org/abs/2108.02456) \n\n\n#### 23.2. Overview\n\n![](.//AttentionMechanism/model/img/ResAtt.png)\n\n#### 23.3. Usage Code\n```python\n\nfrom AttentionMechanism.model.attention.ResidualAttention import ResidualAttention\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninput=torch.randn(50,512,7,7)\nresatt = ResidualAttention(channel=512,num_class=1000,la=0.2)\noutput=resatt(input)\nprint(output.shape)\n\n```\n\n***\n\n\n### 24. S2 Attention Usage\n\n#### 24.1. Paper\n\n[S\u00b2-MLPv2: Improved Spatial-Shift MLP Architecture for Vision---arXiv 2021.08.02](https://arxiv.org/abs/2108.01072) \n\n#### 24.2. Overview\n\n![](.//AttentionMechanism/model/img/S2Attention.png)\n\n#### 24.3. Usage Code\n```python\nfrom AttentionMechanism.model.attention.S2Attention import S2Attention\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninput=torch.randn(50,512,7,7)\ns2att = S2Attention(channels=512)\noutput=s2att(input)\nprint(output.shape)\n\n```\n\n***\n\n\n### 25. GFNet Attention Usage\n\n#### 25.1. Paper\n\n[Global Filter Networks for Image Classification---arXiv 2021.07.01](https://arxiv.org/abs/2107.00645) \n\n\n#### 25.2. Overview\n\n![](.//AttentionMechanism/model/img/GFNet.jpg)\n\n#### 25.3. Usage Code - Implemented by [Wenliang Zhao (Author)](https://scholar.google.com/citations?user=lyPWvuEAAAAJ&hl=en)\n\n```python\nfrom AttentionMechanism.model.attention.gfnet import GFNet\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\nx = torch.randn(1, 3, 224, 224)\ngfnet = GFNet(embed_dim=384, img_size=224, patch_size=16, num_classes=1000)\nout = gfnet(x)\nprint(out.shape)\n\n```\n\n***\n\n\n### 26. TripletAttention Usage\n\n#### 26.1. Paper\n\n[Rotate to Attend: Convolutional Triplet Attention Module---CVPR 2021](https://arxiv.org/abs/2010.03045) \n\n#### 26.2. Overview\n\n![](.//AttentionMechanism/model/img/triplet.png)\n\n#### 26.3. Usage Code - Implemented by [digantamisra98](https://github.com/digantamisra98)\n\n```python\nfrom AttentionMechanism.model.attention.TripletAttention import TripletAttention\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\ninput=torch.randn(50,512,7,7)\ntriplet = TripletAttention()\noutput=triplet(input)\nprint(output.shape)\n```\n***\n\n\n### 27. Coordinate Attention Usage\n\n#### 27.1. Paper\n\n[Coordinate Attention for Efficient Mobile Network Design---CVPR 2021](https://arxiv.org/abs/2103.02907)\n\n\n#### 27.2. Overview\n\n![](.//AttentionMechanism/model/img/CoordAttention.png)\n\n#### 27.3. Usage Code - Implemented by [Andrew-Qibin](https://github.com/Andrew-Qibin)\n\n```python\nfrom AttentionMechanism.model.attention.CoordAttention import CoordAtt\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\ninp=torch.rand([2, 96, 56, 56])\ninp_dim, oup_dim = 96, 96\nreduction=32\n\ncoord_attention = CoordAtt(inp_dim, oup_dim, reduction=reduction)\noutput=coord_attention(inp)\nprint(output.shape)\n```\n\n***\n\n\n### 28. MobileViT Attention Usage\n\n#### 28.1. Paper\n\n[MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer---ArXiv 2021.10.05](https://arxiv.org/abs/2103.02907)\n\n\n#### 28.2. Overview\n\n![](.//AttentionMechanism/model/img/MobileViTAttention.png)\n\n#### 28.3. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.MobileViTAttention import MobileViTAttention\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\nif __name__ == '__main__':\n    m=MobileViTAttention()\n    input=torch.randn(1,3,49,49)\n    output=m(input)\n    print(output.shape)  #output:(1,3,49,49)\n    \n```\n\n***\n\n\n### 29. ParNet Attention Usage\n\n#### 29.1. Paper\n\n[Non-deep Networks---ArXiv 2021.10.20](https://arxiv.org/abs/2110.07641)\n\n\n#### 29.2. Overview\n\n![](.//AttentionMechanism/model/img/ParNet.png)\n\n#### 29.3. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.ParNetAttention import *\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\nif __name__ == '__main__':\n    input=torch.randn(50,512,7,7)\n    pna = ParNetAttention(channel=512)\n    output=pna(input)\n    print(output.shape) #50,512,7,7\n    \n```\n\n***\n\n\n### 30. UFO Attention Usage\n\n#### 30.1. Paper\n\n[UFO-ViT: High Performance Linear Vision Transformer without Softmax---ArXiv 2021.09.29](https://arxiv.org/abs/2110.07641)\n\n\n#### 30.2. Overview\n\n![](.//AttentionMechanism/model/img/UFO.png)\n\n#### 30.3. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.UFOAttention import *\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\nif __name__ == '__main__':\n    input=torch.randn(50,49,512)\n    ufo = UFOAttention(d_model=512, d_k=512, d_v=512, h=8)\n    output=ufo(input,input,input)\n    print(output.shape) #[50, 49, 512]\n    \n```\n\n***\n\n### 31. ACmix Attention Usage\n\n#### 31.1. Paper\n\n[On the Integration of Self-Attention and Convolution](https://arxiv.org/pdf/2111.14556.pdf)\n\n#### 31.2. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.ACmix import ACmix\nimport torch\n\nif __name__ == '__main__':\n    input=torch.randn(50,256,7,7)\n    acmix = ACmix(in_planes=256, out_planes=256)\n    output=acmix(input)\n    print(output.shape)\n    \n```\n***\n\n### 32. MobileViTv2 Attention Usage\n\n#### 32.1. Paper\n\n[Separable Self-attention for Mobile Vision Transformers---ArXiv 2022.06.06](https://arxiv.org/abs/2206.02680)\n\n\n#### 32.2. Overview\n\n![](.//AttentionMechanism/model/img/MobileViTv2.png)\n\n#### 32.3. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.MobileViTv2Attention import MobileViTv2Attention\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\nif __name__ == '__main__':\n    input=torch.randn(50,49,512)\n    sa = MobileViTv2Attention(d_model=512)\n    output=sa(input)\n    print(output.shape)\n    \n```\n***\n\n### 33. DAT Attention Usage\n\n#### 33.1. Paper\n\n[Vision Transformer with Deformable Attention---CVPR2022](https://arxiv.org/abs/2201.00520)\n\n#### 33.2. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.DAT import DAT\nimport torch\n\nif __name__ == '__main__':\n    input=torch.randn(1,3,224,224)\n    model = DAT(\n        img_size=224,\n        patch_size=4,\n        num_classes=1000,\n        expansion=4,\n        dim_stem=96,\n        dims=[96, 192, 384, 768],\n        depths=[2, 2, 6, 2],\n        stage_spec=[['L', 'S'], ['L', 'S'], ['L', 'D', 'L', 'D', 'L', 'D'], ['L', 'D']],\n        heads=[3, 6, 12, 24],\n        window_sizes=[7, 7, 7, 7] ,\n        groups=[-1, -1, 3, 6],\n        use_pes=[False, False, True, True],\n        dwc_pes=[False, False, False, False],\n        strides=[-1, -1, 1, 1],\n        sr_ratios=[-1, -1, -1, -1],\n        offset_range_factor=[-1, -1, 2, 2],\n        no_offs=[False, False, False, False],\n        fixed_pes=[False, False, False, False],\n        use_dwc_mlps=[False, False, False, False],\n        use_conv_patches=False,\n        drop_rate=0.0,\n        attn_drop_rate=0.0,\n        drop_path_rate=0.2,\n    )\n    output=model(input)\n    print(output[0].shape)\n    \n```\n***\n\n### 34. CrossFormer Attention Usage\n\n#### 34.1. Paper\n\n[CROSSFORMER: A VERSATILE VISION TRANSFORMER HINGING ON CROSS-SCALE ATTENTION---ICLR 2022](https://arxiv.org/pdf/2108.00154.pdf)\n\n#### 34.2. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.Crossformer import CrossFormer\nimport torch\n\nif __name__ == '__main__':\n    input=torch.randn(1,3,224,224)\n    model = CrossFormer(img_size=224,\n        patch_size=[4, 8, 16, 32],\n        in_chans= 3,\n        num_classes=1000,\n        embed_dim=48,\n        depths=[2, 2, 6, 2],\n        num_heads=[3, 6, 12, 24],\n        group_size=[7, 7, 7, 7],\n        mlp_ratio=4.,\n        qkv_bias=True,\n        qk_scale=None,\n        drop_rate=0.0,\n        drop_path_rate=0.1,\n        ape=False,\n        patch_norm=True,\n        use_checkpoint=False,\n        merge_size=[[2, 4], [2,4], [2, 4]]\n    )\n    output=model(input)\n    print(output.shape)\n    \n```\n***\n\n### 35. MOATransformer Attention Usage\n\n#### 35.1. Paper\n\n[Aggregating Global Features into Local Vision Transformer](https://arxiv.org/abs/2201.12903)\n\n#### 35.2. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.MOATransformer import MOATransformer\nimport torch\n\nif __name__ == '__main__':\n    input=torch.randn(1,3,224,224)\n    model = MOATransformer(\n        img_size=224,\n        patch_size=4,\n        in_chans=3,\n        num_classes=1000,\n        embed_dim=96,\n        depths=[2, 2, 6],\n        num_heads=[3, 6, 12],\n        window_size=14,\n        mlp_ratio=4.,\n        qkv_bias=True,\n        qk_scale=None,\n        drop_rate=0.0,\n        drop_path_rate=0.1,\n        ape=False,\n        patch_norm=True,\n        use_checkpoint=False\n    )\n    output=model(input)\n    print(output.shape)\n    \n```\n***\n\n### 36. CrissCrossAttention Attention Usage\n\n#### 36.1. Paper\n\n[CCNet: Criss-Cross Attention for Semantic Segmentation](https://arxiv.org/abs/1811.11721)\n\n#### 36.2. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.CrissCrossAttention import CrissCrossAttention\nimport torch\n\nif __name__ == '__main__':\n    input=torch.randn(3, 64, 7, 7)\n    model = CrissCrossAttention(64)\n    outputs = model(input)\n    print(outputs.shape)\n    \n```\n***\n\n### 37. Axial_attention Attention Usage\n\n#### 37.1. Paper\n\n[Axial Attention in Multidimensional Transformers](https://arxiv.org/abs/1912.12180)\n\n#### 37.2. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.Axial_attention import AxialImageTransformer\nimport torch\n\nif __name__ == '__main__':\n    input=torch.randn(3, 128, 7, 7)\n    model = AxialImageTransformer(\n        dim = 128,\n        depth = 12,\n        reversible = True\n    )\n    outputs = model(input)\n    print(outputs.shape)\n    \n```\n***\n\n### 38. Frequency Channel Attention Usage\n\n#### 38.1. Paper\n\n[FcaNet: Frequency Channel Attention Networks (ICCV 2021)](https://arxiv.org/abs/2012.11879)\n\n#### 38.2. Overview\n\n![](.//AttentionMechanism/model/img/FCANet.png)\n\n#### 38.3. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.FCA import MultiSpectralAttentionLayer\nimport torch\n\nif __name__ == \"__main__\":\n    input = torch.randn(32, 128, 64, 64) # (b, c, h, w)\n    fca_layer = MultiSpectralAttentionLayer(channel = 128, dct_h = 64, dct_w = 64, reduction = 16, freq_sel_method = 'top16')\n    output = fca_layer(input)\n    print(output.shape)\n    \n```\n***\n\n### 39. Attention Augmented Convolutional Networks Usage\n\n#### 39.1. Paper\n\n[Attention Augmented Convolutional Networks (ICCV 2019)](https://arxiv.org/abs/1904.09925)\n\n#### 39.2. Overview\n\n![](.//AttentionMechanism/model/img/AAAttention.png)\n\n#### 39.3. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.AAAttention import AugmentedConv\nimport torch\n\nif __name__ == \"__main__\":\n    input = torch.randn((16, 3, 32, 32))\n    augmented_conv = AugmentedConv(in_channels=3, out_channels=64, kernel_size=3, dk=40, dv=4, Nh=4, relative=True, stride=2, shape=16)\n    output = augmented_conv(input)\n    print(output.shape)\n    \n```\n***\n\n### 40. Global Context Attention Usage\n\n#### 40.1. Paper\n\n[GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond (ICCVW 2019 Best Paper)](https://arxiv.org/abs/1904.11492)\n\n[Global Context Networks (TPAMI 2020)](https://arxiv.org/abs/2012.13375)\n\n#### 40.2. Overview\n\n![](.//AttentionMechanism/model/img/GCNet.png)\n\n#### 40.3. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.GCAttention import GCModule\nimport torch\n\nif __name__ == \"__main__\":\n    input = torch.randn(16, 64, 32, 32)\n    gc_layer = GCModule(64)\n    output = gc_layer(input)\n    print(output.shape)\n    \n```\n***\n\n### 41. Linear Context Transform Attention Usage\n\n#### 41.1. Paper\n\n[Linear Context Transform Block (AAAI 2020)](https://arxiv.org/pdf/1909.03834v2)\n\n#### 41.2. Overview\n\n![](.//AttentionMechanism/model/img/LCTAttention.png)\n\n#### 41.3. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.LCTAttention import LCT\nimport torch\n\nif __name__ == \"__main__\":\n    x = torch.randn(16, 64, 32, 32)\n    attn = LCT(64, 8)\n    y = attn(x)\n    print(y.shape)\n    \n```\n***\n\n### 42. Gated Channel Transformation Usage\n\n#### 42.1. Paper\n\n[Gated Channel Transformation for Visual Recognition (CVPR 2020)](https://openaccess.thecvf.com/content_CVPR_2020/papers/Yang_Gated_Channel_Transformation_for_Visual_Recognition_CVPR_2020_paper.pdf)\n\n#### 42.2. Overview\n\n![](.//AttentionMechanism/model/img/GCT.png)\n\n#### 42.3. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.GCTAttention import GCT\nimport torch\n\nif __name__ == \"__main__\":\n    input = torch.randn(16, 64, 32, 32)\n    gct_layer = GCT(64)\n    output = gct_layer(input)\n    print(output.shape)\n    \n```\n***\n\n### 43. Gaussian Context Attention Usage\n\n#### 43.1. Paper\n\n[Gaussian Context Transformer (CVPR 2021)](https://openaccess.thecvf.com//content/CVPR2021/papers/Ruan_Gaussian_Context_Transformer_CVPR_2021_paper.pdf)\n\n#### 43.2. Overview\n\n![](.//AttentionMechanism/model/img/GaussianCA.png)\n\n#### 43.3. Usage Code\n\n```python\nfrom AttentionMechanism.model.attention.GaussianAttention import GCA\nimport torch\n\nif __name__ == \"__main__\":\n    input = torch.randn(16, 64, 32, 32)\n    gca_layer = GCA(64)\n    output = gca_layer(input)\n    print(output.shape)\n    \n```\n\n***\n\n# Acknowledgements\nDuring the development of this project, the following open-source projects provided significant help and support. We hereby express our sincere gratitude:\n\n- [**https://github.com/xmu-xiaoma666/External-Attention-pytorch**](https://github.com/xmu-xiaoma666/External-Attention-pytorch)\n\n- [**https://github.com/cmhungsteve/Awesome-Transformer-Attention**](https://github.com/cmhungsteve/Awesome-Transformer-Attention)\n\n\n",
    "bugtrack_url": null,
    "license": "Apache",
    "summary": "This repository contains an implementation of many attention mechanism models.",
    "version": "1.0.2",
    "project_urls": {
        "Homepage": "https://github.com/gongyan1/Attention-Mechanism-Pytorch"
    },
    "split_keywords": [
        "attention",
        " machine learning",
        " deep learning",
        " neural networks",
        " pytorch"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f0d9b5475da0341ef11afd3c7702beb9f2d5c1e11fb612a8ba628535acea3a18",
                "md5": "285ae33bda9934b947392cf46d5fe393",
                "sha256": "a1245d58e3db24f11f234ea62e2a4036d4b952c827f161f35814e1ac9d36aafe"
            },
            "downloads": -1,
            "filename": "AttentionMechanism-1.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "285ae33bda9934b947392cf46d5fe393",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7.0",
            "size": 79664,
            "upload_time": "2024-08-12T10:27:31",
            "upload_time_iso_8601": "2024-08-12T10:27:31.450176Z",
            "url": "https://files.pythonhosted.org/packages/f0/d9/b5475da0341ef11afd3c7702beb9f2d5c1e11fb612a8ba628535acea3a18/AttentionMechanism-1.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1067a6cdd15c47abec71f2cf2f818adad2c82ef68a3f60307e6fc46c003f1d1c",
                "md5": "78b602a40eae5e6452b1976f5b7e4433",
                "sha256": "93bce4749a6f3dab73b9cd49100a9a9f39c592ddcb3085807fdf4991b19735fb"
            },
            "downloads": -1,
            "filename": "attentionmechanism-1.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "78b602a40eae5e6452b1976f5b7e4433",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7.0",
            "size": 61149,
            "upload_time": "2024-08-12T10:27:32",
            "upload_time_iso_8601": "2024-08-12T10:27:32.708087Z",
            "url": "https://files.pythonhosted.org/packages/10/67/a6cdd15c47abec71f2cf2f818adad2c82ef68a3f60307e6fc46c003f1d1c/attentionmechanism-1.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-12 10:27:32",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "gongyan1",
    "github_project": "Attention-Mechanism-Pytorch",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "attentionmechanism"
}

None