natten


Namenatten JSON
Version 0.21.0 PyPI version JSON
download
home_pagehttps://natten.org
SummaryNeighborhood Attention Extension.
upload_time2025-07-14 22:17:52
maintainerNone
docs_urlNone
authorAli Hassani
requires_python>=3.9
licenseNone
keywords sparse attention deep learning machine learning ml artificial intelligence ai
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <img src="https://natten.org/assets/natten_light.png" width="384" />

*Neighborhood Attention Extension*

<a href="https://natten.org">Documentation / Wheels</a>

NATTEN is an open-source project dedicated to providing infrastructure for
[Neighborhood Attention (NA)](https://openaccess.thecvf.com/content/CVPR2023/html/Hassani_Neighborhood_Attention_Transformer_CVPR_2023_paper.html),
a sliding window self-attention mechanism, and its extensions
([dilated NA](https://arxiv.org/abs/2209.15001),
[causal NA](https://arxiv.org/abs/2403.04690),
[strided NA](https://arxiv.org/abs/2504.16922)).
Specifically, we provide Fused Multi-Headed Attention (FMHA) and
[Fused Neighborhood Attention (FNA)](https://arxiv.org/abs/2403.04690)
training and inference kernels, for all NVIDIA architectures since Maxwell (SM50).
We also ship
[Hopper (SM90) and Blackwell (SM100)](https://arxiv.org/abs/2504.16922) native kernels, offering
speedups proportional to reduction in FLOPs over cuDNN and Flash Attention 3.

Neighborhood Attention introduces locality and sparsity into self attention in a manner similar to
convolution.
This means for any self attention problem, you will be able to specify a `kernel_size`, `stride`,
and `dilation`. Because it's attention, you can also toggle causal masking.

NATTEN is dedicated to **multi-dimensional** layouts of tokens (i.e.
[2-D](https://natten.org/operations/#natten.na2d) and
[3-D](https://natten.org/operations/#natten.na3d) feature maps).
Users have the freedom to explore the massive parameter space that NATTEN offers, in which the
attention span in any dimension/axis of your input can be controlled with its respective
`kernel_size`, `stride`, `dilation`, and `is_causal` parameters.


| <img src="https://natten.org/assets/viz/na.png" width="320" /> | <img src="https://natten.org/assets/viz/dina.png" width="320" /> |
| ---                                              | ---                                                |
| `kernel_size=(6,6)`                              | `kernel_size=(6,6)`                                |
|                                                  | `dilation=(2,2)`                                   |


| <img src="https://natten.org/assets/viz/cna.png" width="320" /> | <img src="https://natten.org/assets/viz/gna.png" width="320" /> |
| ---                                               | ---                                               |
| `kernel_size=(6,6)`                               | `kernel_size=(6,6)`                               |
| `is_causal=(True,True)`                           | `stride=(2,2)`                                    |


## Getting started

NATTEN supports PyTorch >= 2.7, and Python >= 3.9 (everything PyTorch supports).
Please refer to [install instructions](https://natten.org/install/) for details on how to install NATTEN.

### [NEW] Release `0.21.0`

NATTEN has undergone major changes since the last release (`0.17.5`), so we strongly recommend
reading our new updated documentation in this webpage before upgrading.

Our latest release ships our [Hopper FNA](https://natten.org/backends/#hopper-fna-fmha) and
[Blackwell FNA](https://natten.org/backends/#blackwell-fna-fmha) kernels, bringing you
[massive speedups](https://natten.org/profiler/#hopper-and-blackwell-examples) on
modern data center class NVIDIA GPUs such as the H100 and B200.
It also speeds up inference in our existing
[Ampere FNA](https://natten.org/backends/#cutlass-fna-fmha) kernels up to 1.47X in fully
block-sparse cases, provides much cleaner error reporting, ships with our
[profiling toolkit](https://natten.org/profiler/), and so much more!

## License
NATTEN is released under the [MIT License](https://github.com/SHI-Labs/NATTEN/tree/main/LICENSE).

## Citation
If you found NATTEN, or neighborhood attention useful in your work, consider citing the appropriate
papers:

### Original neighborhood attention paper
First work proposing neighborhood attention, and introducing NATTEN.

```bibtex
@inproceedings{hassani2023neighborhood,
  title        = {Neighborhood Attention Transformer},
  author       = {Ali Hassani and Steven Walton and Jiachen Li and Shen Li and Humphrey Shi},
  year         = 2023,
  booktitle    = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}
}
```

### Dilated neighborhood attention
Introduced `dilation` for introducing sparse global context.

```bibtex
@article{hassani2022dilated,
  title        = {Dilated Neighborhood Attention Transformer},
  author       = {Ali Hassani and Humphrey Shi},
  year         = 2022,
  journal      = {arXiv preprint arXiv:2209.15001}
}
```

### GEMM-based and fused neighborhood attention

Introduced the first multi-dimensional attention kernels: GEMM-based and fused neighborhood
attention (FNA).

Introduced causal neighborhood attention, and extended implementation to support varying parameters
across different dimensions.

```bibtex
@inproceedings{hassani2024faster,
  title        = {Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level},
  author       = {Ali Hassani and Wen-Mei Hwu and Humphrey Shi},
  year         = 2024,
  booktitle    = {Advances in Neural Information Processing Systems},
}
```

### Generalized neighborhood attention: towards speed-of-light performance
Introduced even-sized windows, strided neighborhood attention, block-sparse forms of neighborhood
attention, NATTEN Simulator, and our new Hopper and Blackwell FNA kernels, implemented with
out-of-kernel token permutation.

```bibtex
@article{hassani2025generalized,
  title        = {Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light},
  author       = {Hassani, Ali and Zhou, Fengzhe and Kane, Aditya and Huang, Jiannan and Chen, Chieh-Yun and Shi, Min and Walton, Steven and Hoehnerbach, Markus and Thakkar, Vijay and Isaev, Michael and others},
  year         = 2025,
  journal      = {arXiv preprint arXiv:2504.16922}
}
```

## Acknowledgements

We thank NVIDIA, and the [CUTLASS project](https://github.com/NVIDIA/cutlass/), without which this
project would not have been possible.

We also thank Meta and the [xFormers](https://github.com/facebookresearch/xformers/) team
for their FMHA kernel, and the [PyTorch](https://github.com/pytorch/pytorch/) project and team.

            

Raw data

            {
    "_id": null,
    "home_page": "https://natten.org",
    "name": "natten",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "sparse attention, deep learning, machine learning, ml, artificial intelligence, ai",
    "author": "Ali Hassani",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/46/c9/aa6bd171aa23de07f401131ce2aff0d1cfa036e22ad8e7e8faa9d66b8067/natten-0.21.0.tar.gz",
    "platform": null,
    "description": "<img src=\"https://natten.org/assets/natten_light.png\" width=\"384\" />\n\n*Neighborhood Attention Extension*\n\n<a href=\"https://natten.org\">Documentation / Wheels</a>\n\nNATTEN is an open-source project dedicated to providing infrastructure for\n[Neighborhood Attention (NA)](https://openaccess.thecvf.com/content/CVPR2023/html/Hassani_Neighborhood_Attention_Transformer_CVPR_2023_paper.html),\na sliding window self-attention mechanism, and its extensions\n([dilated NA](https://arxiv.org/abs/2209.15001),\n[causal NA](https://arxiv.org/abs/2403.04690),\n[strided NA](https://arxiv.org/abs/2504.16922)).\nSpecifically, we provide Fused Multi-Headed Attention (FMHA) and\n[Fused Neighborhood Attention (FNA)](https://arxiv.org/abs/2403.04690)\ntraining and inference kernels, for all NVIDIA architectures since Maxwell (SM50).\nWe also ship\n[Hopper (SM90) and Blackwell (SM100)](https://arxiv.org/abs/2504.16922) native kernels, offering\nspeedups proportional to reduction in FLOPs over cuDNN and Flash Attention 3.\n\nNeighborhood Attention introduces locality and sparsity into self attention in a manner similar to\nconvolution.\nThis means for any self attention problem, you will be able to specify a `kernel_size`, `stride`,\nand `dilation`. Because it's attention, you can also toggle causal masking.\n\nNATTEN is dedicated to **multi-dimensional** layouts of tokens (i.e.\n[2-D](https://natten.org/operations/#natten.na2d) and\n[3-D](https://natten.org/operations/#natten.na3d) feature maps).\nUsers have the freedom to explore the massive parameter space that NATTEN offers, in which the\nattention span in any dimension/axis of your input can be controlled with its respective\n`kernel_size`, `stride`, `dilation`, and `is_causal` parameters.\n\n\n| <img src=\"https://natten.org/assets/viz/na.png\" width=\"320\" /> | <img src=\"https://natten.org/assets/viz/dina.png\" width=\"320\" /> |\n| ---                                              | ---                                                |\n| `kernel_size=(6,6)`                              | `kernel_size=(6,6)`                                |\n|                                                  | `dilation=(2,2)`                                   |\n\n\n| <img src=\"https://natten.org/assets/viz/cna.png\" width=\"320\" /> | <img src=\"https://natten.org/assets/viz/gna.png\" width=\"320\" /> |\n| ---                                               | ---                                               |\n| `kernel_size=(6,6)`                               | `kernel_size=(6,6)`                               |\n| `is_causal=(True,True)`                           | `stride=(2,2)`                                    |\n\n\n## Getting started\n\nNATTEN supports PyTorch >= 2.7, and Python >= 3.9 (everything PyTorch supports).\nPlease refer to [install instructions](https://natten.org/install/) for details on how to install NATTEN.\n\n### [NEW] Release `0.21.0`\n\nNATTEN has undergone major changes since the last release (`0.17.5`), so we strongly recommend\nreading our new updated documentation in this webpage before upgrading.\n\nOur latest release ships our [Hopper FNA](https://natten.org/backends/#hopper-fna-fmha) and\n[Blackwell FNA](https://natten.org/backends/#blackwell-fna-fmha) kernels, bringing you\n[massive speedups](https://natten.org/profiler/#hopper-and-blackwell-examples) on\nmodern data center class NVIDIA GPUs such as the H100 and B200.\nIt also speeds up inference in our existing\n[Ampere FNA](https://natten.org/backends/#cutlass-fna-fmha) kernels up to 1.47X in fully\nblock-sparse cases, provides much cleaner error reporting, ships with our\n[profiling toolkit](https://natten.org/profiler/), and so much more!\n\n## License\nNATTEN is released under the [MIT License](https://github.com/SHI-Labs/NATTEN/tree/main/LICENSE).\n\n## Citation\nIf you found NATTEN, or neighborhood attention useful in your work, consider citing the appropriate\npapers:\n\n### Original neighborhood attention paper\nFirst work proposing neighborhood attention, and introducing NATTEN.\n\n```bibtex\n@inproceedings{hassani2023neighborhood,\n  title        = {Neighborhood Attention Transformer},\n  author       = {Ali Hassani and Steven Walton and Jiachen Li and Shen Li and Humphrey Shi},\n  year         = 2023,\n  booktitle    = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}\n}\n```\n\n### Dilated neighborhood attention\nIntroduced `dilation` for introducing sparse global context.\n\n```bibtex\n@article{hassani2022dilated,\n  title        = {Dilated Neighborhood Attention Transformer},\n  author       = {Ali Hassani and Humphrey Shi},\n  year         = 2022,\n  journal      = {arXiv preprint arXiv:2209.15001}\n}\n```\n\n### GEMM-based and fused neighborhood attention\n\nIntroduced the first multi-dimensional attention kernels: GEMM-based and fused neighborhood\nattention (FNA).\n\nIntroduced causal neighborhood attention, and extended implementation to support varying parameters\nacross different dimensions.\n\n```bibtex\n@inproceedings{hassani2024faster,\n  title        = {Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level},\n  author       = {Ali Hassani and Wen-Mei Hwu and Humphrey Shi},\n  year         = 2024,\n  booktitle    = {Advances in Neural Information Processing Systems},\n}\n```\n\n### Generalized neighborhood attention: towards speed-of-light performance\nIntroduced even-sized windows, strided neighborhood attention, block-sparse forms of neighborhood\nattention, NATTEN Simulator, and our new Hopper and Blackwell FNA kernels, implemented with\nout-of-kernel token permutation.\n\n```bibtex\n@article{hassani2025generalized,\n  title        = {Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light},\n  author       = {Hassani, Ali and Zhou, Fengzhe and Kane, Aditya and Huang, Jiannan and Chen, Chieh-Yun and Shi, Min and Walton, Steven and Hoehnerbach, Markus and Thakkar, Vijay and Isaev, Michael and others},\n  year         = 2025,\n  journal      = {arXiv preprint arXiv:2504.16922}\n}\n```\n\n## Acknowledgements\n\nWe thank NVIDIA, and the [CUTLASS project](https://github.com/NVIDIA/cutlass/), without which this\nproject would not have been possible.\n\nWe also thank Meta and the [xFormers](https://github.com/facebookresearch/xformers/) team\nfor their FMHA kernel, and the [PyTorch](https://github.com/pytorch/pytorch/) project and team.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Neighborhood Attention Extension.",
    "version": "0.21.0",
    "project_urls": {
        "Homepage": "https://natten.org"
    },
    "split_keywords": [
        "sparse attention",
        " deep learning",
        " machine learning",
        " ml",
        " artificial intelligence",
        " ai"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "46c9aa6bd171aa23de07f401131ce2aff0d1cfa036e22ad8e7e8faa9d66b8067",
                "md5": "7fe86419f292dbd4e66acd0724eab2bb",
                "sha256": "810aaf179e1a6fdb0a9279d7858d00412a0d3f045e546cada415205215ccd539"
            },
            "downloads": -1,
            "filename": "natten-0.21.0.tar.gz",
            "has_sig": false,
            "md5_digest": "7fe86419f292dbd4e66acd0724eab2bb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 2735854,
            "upload_time": "2025-07-14T22:17:52",
            "upload_time_iso_8601": "2025-07-14T22:17:52.707486Z",
            "url": "https://files.pythonhosted.org/packages/46/c9/aa6bd171aa23de07f401131ce2aff0d1cfa036e22ad8e7e8faa9d66b8067/natten-0.21.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-14 22:17:52",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "natten"
}
        
Elapsed time: 1.22773s