natten


Namenatten JSON
Version 0.17.0 PyPI version JSON
download
home_pagehttps://github.com/SHI-Labs/NATTEN
SummaryNeighborhood Attention Extension.
upload_time2024-05-02 17:30:57
maintainerNone
docs_urlNone
authorAli Hassani
requires_python>=3.8
licenseNone
keywords machine learning science ml artificial intelligence ai
VCS
bugtrack_url
requirements packaging torch
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <img src="https://www.shi-labs.com/natten/assets/img/natten_light.png" width="384" />

*Neighborhood Attention Extension*

Bringing attention to a neighborhood near you!

<a href="https://www.shi-labs.com/natten/">Website / Releases</a>
| <a href="https://github.com/SHI-Labs/NATTEN/tree/main/docs/">Documentation</a>

<div align="center">
  <img alt="Visualization of neighborhood attention in 2D." src="https://shi-labs.com/natten/pypi-assets/docs/assets/neighborhood_attn_2d_vis_light.png" width="384" />
  <img alt="Visualization of dilated neighborhood attention in 2D." src="https://shi-labs.com/natten/pypi-assets/docs/assets/dilated_neighborhood_attn_2d_vis_light.png" width="384" />
</div>

NATTEN is an open-source project dedicated to providing fast implementations for
[Neighborhood Attention](https://scholar.google.com/citations?view_op=view_citation&citation_for_view=Ndu0dUcAAAAJ:b0M2c_1WBrUC),
a sliding window self-attention mechanism.

If you're not familiar with neighborhood attention, please refer to 
[our papers](https://github.com/SHI-Labs/Neighborhood-Attention-Transformer), or watch our 
[YouTube video](https://www.youtube.com/watch?v=Ya4BfioxIHA) from CVPR 2023.

To read more about our GEMM-based and fused neighborhood attention kernels, please refer to
our new preprint, [Faster Neighborhood Attention](https://arxiv.org/abs/2403.04690).

## New: Fused Neighborhood Attention now supports backpropagation!

We've released the Fused Neighborhood Attention (FNA) backward kernel and interface, which means you can now
train models based on neighborhood attention faster and more efficiently.

FNA can be seen as a generalization of methods such as [Flash Attention](https://github.com/Dao-AILab/flash-attention/) and
[FMHA](https://github.com/facebookresearch/xformers/) from back-to-back matrix multiplication to
back-to-back tensor-tensor contraction, and comes with neighborhood attention masking built in.
This accelerates accelerates neighborhood attention, a multi-dimensional sliding window attention pattern,
by never storing the attention tensor to global memory, which aside from reducing global memory footprint also reduces
the memory bandwidth bottleneck.

<div align="center">
  <img alt="Op-level average speedup." src="https://shi-labs.com/natten/pypi-assets/assets/fna-chart-light.png" />
</div>

We highly recommend referring to [FNA quick start](https://github.com/SHI-Labs/NATTEN/tree/main/docs/fna/fna-quickstart.md) or 
the [Fused vs unfused NA](https://github.com/SHI-Labs/NATTEN/tree/main/docs/fna/fused-vs-unfused.md) guide before
starting to use FNA, since the interface, memory layout, and feature set can differ from
all unfused ops in NATTEN.

## Getting started
 
NATTEN supports PyTorch version 2.0 and later, and Python versions 3.8 and above. 
Python 3.12 is only supported with torch >= 2.2.0.

Older NATTEN releases supported python >= 3.7 and torch >= 1.8.

Please refer to [install instructions](https://github.com/SHI-Labs/NATTEN/tree/main/docs/install.md) to find out whether your operating system and hardware accelerator is
compatible with NATTEN.

## Feature availability

| Problem space | CPU backend | CUDA backend     |
| -----------   | ----------- | ---------------- |
| 1D            | naive       | naive, gemm, fna |
| 2D            | naive       | naive, gemm, fna |
| 3D            | naive       | naive, fna       |

### CPU

| Problem space | CPU Backend | Causal masking     | Varying parameters | Relative positional bias | Autograd support         |
| -----------   | ----------- | ------------------ | ------------------ | ------------------------ | ------------------------ |
| 1D            | naive       | :white_check_mark: | :white_check_mark: | :white_check_mark:       | Forward and reverse mode |
| 2D            | naive       | :white_check_mark: | :white_check_mark: | :white_check_mark:       | Forward and reverse mode |
| 3D            | naive       | :white_check_mark: | :white_check_mark: | :white_check_mark:       | Forward and reverse mode |

Notes:
* Forward mode autograd does not support relative positional biases and causal masking yet.
* Relative positional biases are not yet supported when any axis has causal masking enabled.

### CUDA

| Problem space | CUDA Backend | Causal masking     | Varying parameters | Relative positional bias | Autograd support         | Min. Arch |
| -----------   | -----------  | ------------------ | ------------------ | ------------------------ | ------------------------ | --------- |
| 1D            | naive        | :white_check_mark: | :white_check_mark: | :white_check_mark:       | Forward and reverse mode | SM35      |
| 2D            | naive        | :white_check_mark: | :white_check_mark: | :white_check_mark:       | Forward and reverse mode | SM35      |
| 3D            | naive        | :white_check_mark: | :white_check_mark: | :white_check_mark:       | Forward and reverse mode | SM35      |
| 1D            | gemm         | -                  | -                  | :white_check_mark:       | Forward and reverse mode | SM70      |
| 2D            | gemm         | -                  | -                  | :white_check_mark:       | Forward and reverse mode | SM70      |
| 1D            | fna          | :white_check_mark: | :white_check_mark: | :white_check_mark:       | Reverse mode             | SM50      |
| 2D            | fna          | :white_check_mark: | :white_check_mark: | :white_check_mark:       | Reverse mode             | SM50      |
| 3D            | fna          | :white_check_mark: | :white_check_mark: | :white_check_mark:       | Reverse mode             | SM50      |

Notes: 
* FP16 kernels are only available on SM50 and above*, and BF16 requires SM80 and above.
  * Naive FP16 kernels are only available on **SM60** and above.
  * FNA FP16 kernels are only available on SM50 and above.
* GEMM backend on SM70 and SM75 can only do FP16.
* Tiled only implements 1/3 of the ops, is only implemented for 2D problems, and requires head dim = 32.
* Forward mode autograd does not support relative positional biases and causal masking yet.
* Relative positional biases are not yet supported when any axis has causal masking enabled.
* Relative positional biases are not supported in FNA during backward pass.

Features that will likely no longer be worked on or improved:
* Relative positional biases
  * There's just better alternatives that don't involve explicitly biasing the attention weight matrix, and they will be more
  performant on top of providing similar or better accuracy levels.
* GEMM-based kernels
  * Since FNA covers more features than our unfused GEMM-based kernels, and we know it to be a better solution
    (please refer to Faster Neighborhood Attention for details), we do not plan to extend or improve these kernels.
  * This includes support for varying parameters, causal masking, and 3-D problems.

## License
NATTEN is released under the [MIT License](LICENSE).

## Citation
```bibtex
@misc{hassani2024faster,
  title        = {Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level},
  author       = {Ali Hassani and Wen-Mei Hwu and Humphrey Shi},
  year         = 2024,
  url          = {https://arxiv.org/abs/2403.04690},
  eprint       = {2403.04690},
  archiveprefix = {arXiv},
  primaryclass = {cs.CV}
}
@inproceedings{hassani2023neighborhood,
  title        = {Neighborhood Attention Transformer},
  author       = {Ali Hassani and Steven Walton and Jiachen Li and Shen Li and Humphrey Shi},
  year         = 2023,
  booktitle    = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}
}
@misc{hassani2022dilated,
  title        = {Dilated Neighborhood Attention Transformer},
  author       = {Ali Hassani and Humphrey Shi},
  year         = 2022,
  url          = {https://arxiv.org/abs/2209.15001},
  eprint       = {2209.15001},
  archiveprefix = {arXiv},
  primaryclass = {cs.CV}
}
```

## Acknowledgements
We thank NVIDIA, and the [CUTLASS project](https://github.com/NVIDIA/cutlass/) and team for their efforts in
creating and open-sourcing CUTLASS. We would also like to thank Haicheng Wu for his valuable feedback and comments which led to
the creation of GEMM-based NA.
We also thank Meta and the [xFormers](https://github.com/facebookresearch/xformers/) team
for their FMHA kernel, which is what our Fused Neighborhood Attention kernel is based on.
We thank the [PyTorch](https://github.com/pytorch/pytorch/) project and team.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/SHI-Labs/NATTEN",
    "name": "natten",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "machine learning, science, ml, artificial intelligence, ai",
    "author": "Ali Hassani",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/ca/11/ffc2227b67e5376ff44d03b991427fe743fea42fa6032ee8e91b21059f50/natten-0.17.0.tar.gz",
    "platform": null,
    "description": "<img src=\"https://www.shi-labs.com/natten/assets/img/natten_light.png\" width=\"384\" />\n\n*Neighborhood Attention Extension*\n\nBringing attention to a neighborhood near you!\n\n<a href=\"https://www.shi-labs.com/natten/\">Website / Releases</a>\n| <a href=\"https://github.com/SHI-Labs/NATTEN/tree/main/docs/\">Documentation</a>\n\n<div align=\"center\">\n  <img alt=\"Visualization of neighborhood attention in 2D.\" src=\"https://shi-labs.com/natten/pypi-assets/docs/assets/neighborhood_attn_2d_vis_light.png\" width=\"384\" />\n  <img alt=\"Visualization of dilated neighborhood attention in 2D.\" src=\"https://shi-labs.com/natten/pypi-assets/docs/assets/dilated_neighborhood_attn_2d_vis_light.png\" width=\"384\" />\n</div>\n\nNATTEN is an open-source project dedicated to providing fast implementations for\n[Neighborhood Attention](https://scholar.google.com/citations?view_op=view_citation&citation_for_view=Ndu0dUcAAAAJ:b0M2c_1WBrUC),\na sliding window self-attention mechanism.\n\nIf you're not familiar with neighborhood attention, please refer to \n[our papers](https://github.com/SHI-Labs/Neighborhood-Attention-Transformer), or watch our \n[YouTube video](https://www.youtube.com/watch?v=Ya4BfioxIHA) from CVPR 2023.\n\nTo read more about our GEMM-based and fused neighborhood attention kernels, please refer to\nour new preprint, [Faster Neighborhood Attention](https://arxiv.org/abs/2403.04690).\n\n## New: Fused Neighborhood Attention now supports backpropagation!\n\nWe've released the Fused Neighborhood Attention (FNA) backward kernel and interface, which means you can now\ntrain models based on neighborhood attention faster and more efficiently.\n\nFNA can be seen as a generalization of methods such as [Flash Attention](https://github.com/Dao-AILab/flash-attention/) and\n[FMHA](https://github.com/facebookresearch/xformers/) from back-to-back matrix multiplication to\nback-to-back tensor-tensor contraction, and comes with neighborhood attention masking built in.\nThis accelerates accelerates neighborhood attention, a multi-dimensional sliding window attention pattern,\nby never storing the attention tensor to global memory, which aside from reducing global memory footprint also reduces\nthe memory bandwidth bottleneck.\n\n<div align=\"center\">\n  <img alt=\"Op-level average speedup.\" src=\"https://shi-labs.com/natten/pypi-assets/assets/fna-chart-light.png\" />\n</div>\n\nWe highly recommend referring to [FNA quick start](https://github.com/SHI-Labs/NATTEN/tree/main/docs/fna/fna-quickstart.md) or \nthe [Fused vs unfused NA](https://github.com/SHI-Labs/NATTEN/tree/main/docs/fna/fused-vs-unfused.md) guide before\nstarting to use FNA, since the interface, memory layout, and feature set can differ from\nall unfused ops in NATTEN.\n\n## Getting started\n \nNATTEN supports PyTorch version 2.0 and later, and Python versions 3.8 and above. \nPython 3.12 is only supported with torch >= 2.2.0.\n\nOlder NATTEN releases supported python >= 3.7 and torch >= 1.8.\n\nPlease refer to [install instructions](https://github.com/SHI-Labs/NATTEN/tree/main/docs/install.md) to find out whether your operating system and hardware accelerator is\ncompatible with NATTEN.\n\n## Feature availability\n\n| Problem space | CPU backend | CUDA backend     |\n| -----------   | ----------- | ---------------- |\n| 1D            | naive       | naive, gemm, fna |\n| 2D            | naive       | naive, gemm, fna |\n| 3D            | naive       | naive, fna       |\n\n### CPU\n\n| Problem space | CPU Backend | Causal masking     | Varying parameters | Relative positional bias | Autograd support         |\n| -----------   | ----------- | ------------------ | ------------------ | ------------------------ | ------------------------ |\n| 1D            | naive       | :white_check_mark: | :white_check_mark: | :white_check_mark:       | Forward and reverse mode |\n| 2D            | naive       | :white_check_mark: | :white_check_mark: | :white_check_mark:       | Forward and reverse mode |\n| 3D            | naive       | :white_check_mark: | :white_check_mark: | :white_check_mark:       | Forward and reverse mode |\n\nNotes:\n* Forward mode autograd does not support relative positional biases and causal masking yet.\n* Relative positional biases are not yet supported when any axis has causal masking enabled.\n\n### CUDA\n\n| Problem space | CUDA Backend | Causal masking     | Varying parameters | Relative positional bias | Autograd support         | Min. Arch |\n| -----------   | -----------  | ------------------ | ------------------ | ------------------------ | ------------------------ | --------- |\n| 1D            | naive        | :white_check_mark: | :white_check_mark: | :white_check_mark:       | Forward and reverse mode | SM35      |\n| 2D            | naive        | :white_check_mark: | :white_check_mark: | :white_check_mark:       | Forward and reverse mode | SM35      |\n| 3D            | naive        | :white_check_mark: | :white_check_mark: | :white_check_mark:       | Forward and reverse mode | SM35      |\n| 1D            | gemm         | -                  | -                  | :white_check_mark:       | Forward and reverse mode | SM70      |\n| 2D            | gemm         | -                  | -                  | :white_check_mark:       | Forward and reverse mode | SM70      |\n| 1D            | fna          | :white_check_mark: | :white_check_mark: | :white_check_mark:       | Reverse mode             | SM50      |\n| 2D            | fna          | :white_check_mark: | :white_check_mark: | :white_check_mark:       | Reverse mode             | SM50      |\n| 3D            | fna          | :white_check_mark: | :white_check_mark: | :white_check_mark:       | Reverse mode             | SM50      |\n\nNotes: \n* FP16 kernels are only available on SM50 and above*, and BF16 requires SM80 and above.\n  * Naive FP16 kernels are only available on **SM60** and above.\n  * FNA FP16 kernels are only available on SM50 and above.\n* GEMM backend on SM70 and SM75 can only do FP16.\n* Tiled only implements 1/3 of the ops, is only implemented for 2D problems, and requires head dim = 32.\n* Forward mode autograd does not support relative positional biases and causal masking yet.\n* Relative positional biases are not yet supported when any axis has causal masking enabled.\n* Relative positional biases are not supported in FNA during backward pass.\n\nFeatures that will likely no longer be worked on or improved:\n* Relative positional biases\n  * There's just better alternatives that don't involve explicitly biasing the attention weight matrix, and they will be more\n  performant on top of providing similar or better accuracy levels.\n* GEMM-based kernels\n  * Since FNA covers more features than our unfused GEMM-based kernels, and we know it to be a better solution\n    (please refer to Faster Neighborhood Attention for details), we do not plan to extend or improve these kernels.\n  * This includes support for varying parameters, causal masking, and 3-D problems.\n\n## License\nNATTEN is released under the [MIT License](LICENSE).\n\n## Citation\n```bibtex\n@misc{hassani2024faster,\n  title        = {Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level},\n  author       = {Ali Hassani and Wen-Mei Hwu and Humphrey Shi},\n  year         = 2024,\n  url          = {https://arxiv.org/abs/2403.04690},\n  eprint       = {2403.04690},\n  archiveprefix = {arXiv},\n  primaryclass = {cs.CV}\n}\n@inproceedings{hassani2023neighborhood,\n  title        = {Neighborhood Attention Transformer},\n  author       = {Ali Hassani and Steven Walton and Jiachen Li and Shen Li and Humphrey Shi},\n  year         = 2023,\n  booktitle    = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}\n}\n@misc{hassani2022dilated,\n  title        = {Dilated Neighborhood Attention Transformer},\n  author       = {Ali Hassani and Humphrey Shi},\n  year         = 2022,\n  url          = {https://arxiv.org/abs/2209.15001},\n  eprint       = {2209.15001},\n  archiveprefix = {arXiv},\n  primaryclass = {cs.CV}\n}\n```\n\n## Acknowledgements\nWe thank NVIDIA, and the [CUTLASS project](https://github.com/NVIDIA/cutlass/) and team for their efforts in\ncreating and open-sourcing CUTLASS. We would also like to thank Haicheng Wu for his valuable feedback and comments which led to\nthe creation of GEMM-based NA.\nWe also thank Meta and the [xFormers](https://github.com/facebookresearch/xformers/) team\nfor their FMHA kernel, which is what our Fused Neighborhood Attention kernel is based on.\nWe thank the [PyTorch](https://github.com/pytorch/pytorch/) project and team.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Neighborhood Attention Extension.",
    "version": "0.17.0",
    "project_urls": {
        "Homepage": "https://github.com/SHI-Labs/NATTEN"
    },
    "split_keywords": [
        "machine learning",
        " science",
        " ml",
        " artificial intelligence",
        " ai"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ca11ffc2227b67e5376ff44d03b991427fe743fea42fa6032ee8e91b21059f50",
                "md5": "0e574af6414712655b38f264a2f97c9d",
                "sha256": "295770021b2221bf4f866980dfc3a05838d9645f08a5c1958cf10fdd0680370c"
            },
            "downloads": -1,
            "filename": "natten-0.17.0.tar.gz",
            "has_sig": false,
            "md5_digest": "0e574af6414712655b38f264a2f97c9d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 10848012,
            "upload_time": "2024-05-02T17:30:57",
            "upload_time_iso_8601": "2024-05-02T17:30:57.285810Z",
            "url": "https://files.pythonhosted.org/packages/ca/11/ffc2227b67e5376ff44d03b991427fe743fea42fa6032ee8e91b21059f50/natten-0.17.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-02 17:30:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "SHI-Labs",
    "github_project": "NATTEN",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "packaging",
            "specs": []
        },
        {
            "name": "torch",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        }
    ],
    "lcname": "natten"
}
        
Elapsed time: 0.28706s