<img src="https://natten.org/assets/natten_light.png" width="384" />
*Neighborhood Attention Extension*
<a href="https://natten.org">Documentation / Wheels</a>
NATTEN is an open-source project dedicated to providing infrastructure for
[Neighborhood Attention (NA)](https://openaccess.thecvf.com/content/CVPR2023/html/Hassani_Neighborhood_Attention_Transformer_CVPR_2023_paper.html),
a sliding window self-attention mechanism, and its extensions
([dilated NA](https://arxiv.org/abs/2209.15001),
[causal NA](https://arxiv.org/abs/2403.04690),
[strided NA](https://arxiv.org/abs/2504.16922)).
Specifically, we provide Fused Multi-Headed Attention (FMHA) and
[Fused Neighborhood Attention (FNA)](https://arxiv.org/abs/2403.04690)
training and inference kernels, for all NVIDIA architectures since Maxwell (SM50).
We also ship
[Hopper (SM90) and Blackwell (SM100)](https://arxiv.org/abs/2504.16922) native kernels, offering
speedups proportional to reduction in FLOPs over cuDNN and Flash Attention 3.
Neighborhood Attention introduces locality and sparsity into self attention in a manner similar to
convolution.
This means for any self attention problem, you will be able to specify a `kernel_size`, `stride`,
and `dilation`. Because it's attention, you can also toggle causal masking.
NATTEN is dedicated to **multi-dimensional** layouts of tokens (i.e.
[2-D](https://natten.org/operations/#natten.na2d) and
[3-D](https://natten.org/operations/#natten.na3d) feature maps).
Users have the freedom to explore the massive parameter space that NATTEN offers, in which the
attention span in any dimension/axis of your input can be controlled with its respective
`kernel_size`, `stride`, `dilation`, and `is_causal` parameters.
| <img src="https://natten.org/assets/viz/na.png" width="320" /> | <img src="https://natten.org/assets/viz/dina.png" width="320" /> |
| --- | --- |
| `kernel_size=(6,6)` | `kernel_size=(6,6)` |
| | `dilation=(2,2)` |
| <img src="https://natten.org/assets/viz/cna.png" width="320" /> | <img src="https://natten.org/assets/viz/gna.png" width="320" /> |
| --- | --- |
| `kernel_size=(6,6)` | `kernel_size=(6,6)` |
| `is_causal=(True,True)` | `stride=(2,2)` |
## Getting started
NATTEN supports PyTorch >= 2.7, and Python >= 3.9 (everything PyTorch supports).
Please refer to [install instructions](https://natten.org/install/) for details on how to install NATTEN.
### [NEW] Release `0.21.0`
NATTEN has undergone major changes since the last release (`0.17.5`), so we strongly recommend
reading our new updated documentation in this webpage before upgrading.
Our latest release ships our [Hopper FNA](https://natten.org/backends/#hopper-fna-fmha) and
[Blackwell FNA](https://natten.org/backends/#blackwell-fna-fmha) kernels, bringing you
[massive speedups](https://natten.org/profiler/#hopper-and-blackwell-examples) on
modern data center class NVIDIA GPUs such as the H100 and B200.
It also speeds up inference in our existing
[Ampere FNA](https://natten.org/backends/#cutlass-fna-fmha) kernels up to 1.47X in fully
block-sparse cases, provides much cleaner error reporting, ships with our
[profiling toolkit](https://natten.org/profiler/), and so much more!
## License
NATTEN is released under the [MIT License](https://github.com/SHI-Labs/NATTEN/tree/main/LICENSE).
## Citation
If you found NATTEN, or neighborhood attention useful in your work, consider citing the appropriate
papers:
### Original neighborhood attention paper
First work proposing neighborhood attention, and introducing NATTEN.
```bibtex
@inproceedings{hassani2023neighborhood,
title = {Neighborhood Attention Transformer},
author = {Ali Hassani and Steven Walton and Jiachen Li and Shen Li and Humphrey Shi},
year = 2023,
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}
}
```
### Dilated neighborhood attention
Introduced `dilation` for introducing sparse global context.
```bibtex
@article{hassani2022dilated,
title = {Dilated Neighborhood Attention Transformer},
author = {Ali Hassani and Humphrey Shi},
year = 2022,
journal = {arXiv preprint arXiv:2209.15001}
}
```
### GEMM-based and fused neighborhood attention
Introduced the first multi-dimensional attention kernels: GEMM-based and fused neighborhood
attention (FNA).
Introduced causal neighborhood attention, and extended implementation to support varying parameters
across different dimensions.
```bibtex
@inproceedings{hassani2024faster,
title = {Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level},
author = {Ali Hassani and Wen-Mei Hwu and Humphrey Shi},
year = 2024,
booktitle = {Advances in Neural Information Processing Systems},
}
```
### Generalized neighborhood attention: towards speed-of-light performance
Introduced even-sized windows, strided neighborhood attention, block-sparse forms of neighborhood
attention, NATTEN Simulator, and our new Hopper and Blackwell FNA kernels, implemented with
out-of-kernel token permutation.
```bibtex
@article{hassani2025generalized,
title = {Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light},
author = {Hassani, Ali and Zhou, Fengzhe and Kane, Aditya and Huang, Jiannan and Chen, Chieh-Yun and Shi, Min and Walton, Steven and Hoehnerbach, Markus and Thakkar, Vijay and Isaev, Michael and others},
year = 2025,
journal = {arXiv preprint arXiv:2504.16922}
}
```
## Acknowledgements
We thank NVIDIA, and the [CUTLASS project](https://github.com/NVIDIA/cutlass/), without which this
project would not have been possible.
We also thank Meta and the [xFormers](https://github.com/facebookresearch/xformers/) team
for their FMHA kernel, and the [PyTorch](https://github.com/pytorch/pytorch/) project and team.
Raw data
{
"_id": null,
"home_page": "https://natten.org",
"name": "natten",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "sparse attention, deep learning, machine learning, ml, artificial intelligence, ai",
"author": "Ali Hassani",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/46/c9/aa6bd171aa23de07f401131ce2aff0d1cfa036e22ad8e7e8faa9d66b8067/natten-0.21.0.tar.gz",
"platform": null,
"description": "<img src=\"https://natten.org/assets/natten_light.png\" width=\"384\" />\n\n*Neighborhood Attention Extension*\n\n<a href=\"https://natten.org\">Documentation / Wheels</a>\n\nNATTEN is an open-source project dedicated to providing infrastructure for\n[Neighborhood Attention (NA)](https://openaccess.thecvf.com/content/CVPR2023/html/Hassani_Neighborhood_Attention_Transformer_CVPR_2023_paper.html),\na sliding window self-attention mechanism, and its extensions\n([dilated NA](https://arxiv.org/abs/2209.15001),\n[causal NA](https://arxiv.org/abs/2403.04690),\n[strided NA](https://arxiv.org/abs/2504.16922)).\nSpecifically, we provide Fused Multi-Headed Attention (FMHA) and\n[Fused Neighborhood Attention (FNA)](https://arxiv.org/abs/2403.04690)\ntraining and inference kernels, for all NVIDIA architectures since Maxwell (SM50).\nWe also ship\n[Hopper (SM90) and Blackwell (SM100)](https://arxiv.org/abs/2504.16922) native kernels, offering\nspeedups proportional to reduction in FLOPs over cuDNN and Flash Attention 3.\n\nNeighborhood Attention introduces locality and sparsity into self attention in a manner similar to\nconvolution.\nThis means for any self attention problem, you will be able to specify a `kernel_size`, `stride`,\nand `dilation`. Because it's attention, you can also toggle causal masking.\n\nNATTEN is dedicated to **multi-dimensional** layouts of tokens (i.e.\n[2-D](https://natten.org/operations/#natten.na2d) and\n[3-D](https://natten.org/operations/#natten.na3d) feature maps).\nUsers have the freedom to explore the massive parameter space that NATTEN offers, in which the\nattention span in any dimension/axis of your input can be controlled with its respective\n`kernel_size`, `stride`, `dilation`, and `is_causal` parameters.\n\n\n| <img src=\"https://natten.org/assets/viz/na.png\" width=\"320\" /> | <img src=\"https://natten.org/assets/viz/dina.png\" width=\"320\" /> |\n| --- | --- |\n| `kernel_size=(6,6)` | `kernel_size=(6,6)` |\n| | `dilation=(2,2)` |\n\n\n| <img src=\"https://natten.org/assets/viz/cna.png\" width=\"320\" /> | <img src=\"https://natten.org/assets/viz/gna.png\" width=\"320\" /> |\n| --- | --- |\n| `kernel_size=(6,6)` | `kernel_size=(6,6)` |\n| `is_causal=(True,True)` | `stride=(2,2)` |\n\n\n## Getting started\n\nNATTEN supports PyTorch >= 2.7, and Python >= 3.9 (everything PyTorch supports).\nPlease refer to [install instructions](https://natten.org/install/) for details on how to install NATTEN.\n\n### [NEW] Release `0.21.0`\n\nNATTEN has undergone major changes since the last release (`0.17.5`), so we strongly recommend\nreading our new updated documentation in this webpage before upgrading.\n\nOur latest release ships our [Hopper FNA](https://natten.org/backends/#hopper-fna-fmha) and\n[Blackwell FNA](https://natten.org/backends/#blackwell-fna-fmha) kernels, bringing you\n[massive speedups](https://natten.org/profiler/#hopper-and-blackwell-examples) on\nmodern data center class NVIDIA GPUs such as the H100 and B200.\nIt also speeds up inference in our existing\n[Ampere FNA](https://natten.org/backends/#cutlass-fna-fmha) kernels up to 1.47X in fully\nblock-sparse cases, provides much cleaner error reporting, ships with our\n[profiling toolkit](https://natten.org/profiler/), and so much more!\n\n## License\nNATTEN is released under the [MIT License](https://github.com/SHI-Labs/NATTEN/tree/main/LICENSE).\n\n## Citation\nIf you found NATTEN, or neighborhood attention useful in your work, consider citing the appropriate\npapers:\n\n### Original neighborhood attention paper\nFirst work proposing neighborhood attention, and introducing NATTEN.\n\n```bibtex\n@inproceedings{hassani2023neighborhood,\n title = {Neighborhood Attention Transformer},\n author = {Ali Hassani and Steven Walton and Jiachen Li and Shen Li and Humphrey Shi},\n year = 2023,\n booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}\n}\n```\n\n### Dilated neighborhood attention\nIntroduced `dilation` for introducing sparse global context.\n\n```bibtex\n@article{hassani2022dilated,\n title = {Dilated Neighborhood Attention Transformer},\n author = {Ali Hassani and Humphrey Shi},\n year = 2022,\n journal = {arXiv preprint arXiv:2209.15001}\n}\n```\n\n### GEMM-based and fused neighborhood attention\n\nIntroduced the first multi-dimensional attention kernels: GEMM-based and fused neighborhood\nattention (FNA).\n\nIntroduced causal neighborhood attention, and extended implementation to support varying parameters\nacross different dimensions.\n\n```bibtex\n@inproceedings{hassani2024faster,\n title = {Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level},\n author = {Ali Hassani and Wen-Mei Hwu and Humphrey Shi},\n year = 2024,\n booktitle = {Advances in Neural Information Processing Systems},\n}\n```\n\n### Generalized neighborhood attention: towards speed-of-light performance\nIntroduced even-sized windows, strided neighborhood attention, block-sparse forms of neighborhood\nattention, NATTEN Simulator, and our new Hopper and Blackwell FNA kernels, implemented with\nout-of-kernel token permutation.\n\n```bibtex\n@article{hassani2025generalized,\n title = {Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light},\n author = {Hassani, Ali and Zhou, Fengzhe and Kane, Aditya and Huang, Jiannan and Chen, Chieh-Yun and Shi, Min and Walton, Steven and Hoehnerbach, Markus and Thakkar, Vijay and Isaev, Michael and others},\n year = 2025,\n journal = {arXiv preprint arXiv:2504.16922}\n}\n```\n\n## Acknowledgements\n\nWe thank NVIDIA, and the [CUTLASS project](https://github.com/NVIDIA/cutlass/), without which this\nproject would not have been possible.\n\nWe also thank Meta and the [xFormers](https://github.com/facebookresearch/xformers/) team\nfor their FMHA kernel, and the [PyTorch](https://github.com/pytorch/pytorch/) project and team.\n",
"bugtrack_url": null,
"license": null,
"summary": "Neighborhood Attention Extension.",
"version": "0.21.0",
"project_urls": {
"Homepage": "https://natten.org"
},
"split_keywords": [
"sparse attention",
" deep learning",
" machine learning",
" ml",
" artificial intelligence",
" ai"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "46c9aa6bd171aa23de07f401131ce2aff0d1cfa036e22ad8e7e8faa9d66b8067",
"md5": "7fe86419f292dbd4e66acd0724eab2bb",
"sha256": "810aaf179e1a6fdb0a9279d7858d00412a0d3f045e546cada415205215ccd539"
},
"downloads": -1,
"filename": "natten-0.21.0.tar.gz",
"has_sig": false,
"md5_digest": "7fe86419f292dbd4e66acd0724eab2bb",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 2735854,
"upload_time": "2025-07-14T22:17:52",
"upload_time_iso_8601": "2025-07-14T22:17:52.707486Z",
"url": "https://files.pythonhosted.org/packages/46/c9/aa6bd171aa23de07f401131ce2aff0d1cfa036e22ad8e7e8faa9d66b8067/natten-0.21.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-14 22:17:52",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "natten"
}