qattn

Name	qattn JSON
Version	0.1.1 JSON
	download
home_page	None
Summary	Efficient GPU Kernels in Triton for Quantized Vision Transformers
upload_time	2024-06-21 17:27:15
maintainer	None
docs_url	None
author	None
requires_python	>=3.8
license	MIT License Copyright (c) 2024 International Business Machines Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords	quantization vision transformers efficient gpu
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # QAttn

Welcome to the QAttn documentation! QAttn (pronounced like katana) is a [python](https://docs.python.org/3) only framework with GPU kernels implemented in [Triton](https://triton-lang.org/) for quantized vision transformers. This framework implements integer and mixed-precision kernels for operations within vision transformers (currently matrix multiplication and attention) for static and dynamic quantization.

## Installation

To install the package, run

```bash
pip install qattn
```

or install from source to get the latest bleeding-edge source version.

```bash
pip install git+https://github.com/ibm/qattn.git
```

This package depends on Triton, requiring NVIDIA GPU (preferably Ampere or newer), and is tested only on Linux.


To install and modify source code, you can clone the repository locally and install it in editable mode.

```bash
git clone https://github.com/ibm/qattn.git
cd qattn
pip install -e .
```

## Usage

In the [Examples](examples) section, we present static and dynamic quantization usage samples using QAttn. QAttn is designed to be compatible with PyTorch FX-Quantization to replace dynamically models' graph floating-point modules with quantized ones. This comes with the downside of being unable to capture the control statements in the graph.

## Future direction

In the future, we will support the rest of the basic Vision Transformers operations (GELU, LayerNorm, Add, etc.) for fully quantized models. Next, we will move to the PyTorch 2.0 torchdynamo graph capture to enable integration with `torch.compile`.

## Citation

If you use the project in your research paper or thesis, we would appreciate to use following citation:

```bibtex
@InProceedings{Kluska_2024_CVPR,
    author    = {Kluska, Piotr and Castell\'o, Adri\'an and Scheidegger, Florian and Malossi, A. Cristiano I. and Quintana-Ort{\'\i}, Enrique S.},
    title     = {QAttn: Efficient GPU Kernels for Mixed-precision Vision Transformers},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month     = {June},
    year      = {2024},
    pages     = {3648-3657}
}
```

## Acknowledgments

> The work is conducted within the project APROPOS. This project has received funding from the European Union’s Horizon 2020 (H2020) Marie Sklodowska-Curie Innovative Training Networks H2020-MSCA-ITN-2020 call, under the Grant Agreement no 956090. Project link: https://apropos-project.eu/

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "qattn",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Piotr Kluska <klu@zurich.ibm.com>",
    "keywords": "quantization, vision transformers, efficient, gpu",
    "author": null,
    "author_email": "Piotr Kluska <klu@zurich.ibm.com>, Florian Scheidegger <eid@zurich.ibm.com>, \"A. Cristiano I. Malossi\" <acm@zurich.ibm.com>, \"Enrique S. Quintana-Ort\u00ed\" <quintana@disca.upv.es>",
    "download_url": null,
    "platform": null,
    "description": "# QAttn\n\nWelcome to the QAttn documentation! QAttn (pronounced like katana) is a [python](https://docs.python.org/3) only framework with GPU kernels implemented in [Triton](https://triton-lang.org/) for quantized vision transformers. This framework implements integer and mixed-precision kernels for operations within vision transformers (currently matrix multiplication and attention) for static and dynamic quantization.\n\n## Installation\n\nTo install the package, run\n\n```bash\npip install qattn\n```\n\nor install from source to get the latest bleeding-edge source version.\n\n```bash\npip install git+https://github.com/ibm/qattn.git\n```\n\nThis package depends on Triton, requiring NVIDIA GPU (preferably Ampere or newer), and is tested only on Linux.\n\n\nTo install and modify source code, you can clone the repository locally and install it in editable mode.\n\n```bash\ngit clone https://github.com/ibm/qattn.git\ncd qattn\npip install -e .\n```\n\n## Usage\n\nIn the [Examples](examples) section, we present static and dynamic quantization usage samples using QAttn. QAttn is designed to be compatible with PyTorch FX-Quantization to replace dynamically models' graph floating-point modules with quantized ones. This comes with the downside of being unable to capture the control statements in the graph.\n\n## Future direction\n\nIn the future, we will support the rest of the basic Vision Transformers operations (GELU, LayerNorm, Add, etc.) for fully quantized models. Next, we will move to the PyTorch 2.0 torchdynamo graph capture to enable integration with `torch.compile`.\n\n## Citation\n\nIf you use the project in your research paper or thesis, we would appreciate to use following citation:\n\n```bibtex\n@InProceedings{Kluska_2024_CVPR,\n    author    = {Kluska, Piotr and Castell\\'o, Adri\\'an and Scheidegger, Florian and Malossi, A. Cristiano I. and Quintana-Ort{\\'\\i}, Enrique S.},\n    title     = {QAttn: Efficient GPU Kernels for Mixed-precision Vision Transformers},\n    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},\n    month     = {June},\n    year      = {2024},\n    pages     = {3648-3657}\n}\n```\n\n## Acknowledgments\n\n> The work is conducted within the project APROPOS. This project has received funding from the European Union\u2019s Horizon 2020 (H2020) Marie Sklodowska-Curie Innovative Training Networks H2020-MSCA-ITN-2020 call, under the Grant Agreement no 956090. Project link: https://apropos-project.eu/\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2024 International Business Machines  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "Efficient GPU Kernels in Triton for Quantized Vision Transformers",
    "version": "0.1.1",
    "project_urls": {
        "homepage": "https://github.com/IBM/qattn",
        "repository": "https://github.com/IBM/qattn"
    },
    "split_keywords": [
        "quantization",
        " vision transformers",
        " efficient",
        " gpu"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "77bc51e4176b1b6d67a221acc8a3245e481d8ba2d604f3fed6e3946e19851f8b",
                "md5": "7dee8c288931d82ffae9985310581322",
                "sha256": "0f3d48152182c8e51d4dc53ae6b0048df0b3bec023acd3cbe477c1ae4ec42332"
            },
            "downloads": -1,
            "filename": "qattn-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7dee8c288931d82ffae9985310581322",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 35407,
            "upload_time": "2024-06-21T17:27:15",
            "upload_time_iso_8601": "2024-06-21T17:27:15.786071Z",
            "url": "https://files.pythonhosted.org/packages/77/bc/51e4176b1b6d67a221acc8a3245e481d8ba2d604f3fed6e3946e19851f8b/qattn-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-21 17:27:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "IBM",
    "github_project": "qattn",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "qattn"
}

None