leetDecoding

Name	leetDecoding JSON
Version	0.0.2 JSON
	download
home_page	https://github.com/Computational-Machine-Intelligence/efficient_linear_decoding
Summary	Efficient computation library for linear attention.
upload_time	2024-09-18 06:45:25
maintainer	None
docs_url	None
author	Jiaping Wang
requires_python	>=3.8
license	MIT Licence
keywords	pip leetdecoding leetdecoding efficient_linear_decoding
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ## An efficient Linear Attention Decoding package

### 1. installation

```bash
conda create -n leetDecoding python==3.9
conda activate leetDecoding
pip install leetDecoding
```

The code has been test under the following environment:
```python
triton>=2.1.0
torch>=2.1.0
pycuda
pynvml
numpy<2
```
You can use the following command to install:
```python
pip install triton==2.1.0
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
pip install pycuda
pip install pynvml
pip install numpy
```

### 2. usage

```python
import torch
from leetDecoding.efficient_linear_decoding import causal_linear_decoder

# Create input tensor
Q = torch.randn(2,32,1024,128,device='cuda:0')
K = torch.randn(2,32,1024,128,device='cuda:0')
V = torch.randn(2,32,1024,128,device='cuda:0')

# Inference using causal_linear_decoder
output = causal_linear_decoder(Q,K,V)

# If you want to input a mask with weight that values are exp(-gamma), set the is_mask_weight: True and is_need_exp:True
gamma = torch.full((32,),0.5,device='cuda:0')
output = causal_linear_decoder(Q,K,V,is_mask_weight=True,gamma=gamma,is_need_exp=True)

# If you just want to input a mask with weight, set the is_mask_weight: True and is_need_exp:False
gamma = torch.full((32,),0.5,device='cuda:0')
output = causal_linear_decoder(Q,K,V,is_mask_weight=True,gamma=gamma,is_need_exp=False)

# If you want to use a specified methods, such as FleetAttention, set the attn-method: 'FleetAttention'
gamma = torch.full((32,),0.5,device='cuda:0')
output = causal_linear_decoder(Q,K,V,is_mask_weight=False,attn_method='FleetAttention')

```


### 3. acknowledgement
|method|Title|Paper|Code|
|---|---|---|---|
|causal_dot_product|Fast Transformers with Clustered Attention|[arxiv](https://arxiv.org/abs/2007.04825) |[code](https://github.com/idiap/fast-transformers/tree/master/fast_transformers/causal_product)|
|Lighting Attention-2|Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models|[arxiv](https://arxiv.org/abs/2401.04658)|[code](https://github.com/OpenNLPLab/lightning-attention)

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Computational-Machine-Intelligence/efficient_linear_decoding",
    "name": "leetDecoding",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "pip, leetDecoding, LeetDecoding, efficient_linear_decoding",
    "author": "Jiaping Wang",
    "author_email": "wjp666.s@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/47/f5/cd6b7ac1d285cbf4147a07baa235e873efc159f1ba605ecab4e1791d09e9/leetDecoding-0.0.2.tar.gz",
    "platform": "any",
    "description": "## An efficient Linear Attention Decoding package\n\n### 1. installation\n\n```bash\nconda create -n leetDecoding python==3.9\nconda activate leetDecoding\npip install leetDecoding\n```\n\nThe code has been test under the following environment:\n```python\ntriton>=2.1.0\ntorch>=2.1.0\npycuda\npynvml\nnumpy<2\n```\nYou can use the following command to install:\n```python\npip install triton==2.1.0\npip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118\npip install pycuda\npip install pynvml\npip install numpy\n```\n\n### 2. usage\n\n```python\nimport torch\nfrom leetDecoding.efficient_linear_decoding import causal_linear_decoder\n\n# Create input tensor\nQ = torch.randn(2,32,1024,128,device='cuda:0')\nK = torch.randn(2,32,1024,128,device='cuda:0')\nV = torch.randn(2,32,1024,128,device='cuda:0')\n\n# Inference using causal_linear_decoder\noutput = causal_linear_decoder(Q,K,V)\n\n# If you want to input a mask with weight that values are exp(-gamma), set the is_mask_weight: True and is_need_exp:True\ngamma = torch.full((32,),0.5,device='cuda:0')\noutput = causal_linear_decoder(Q,K,V,is_mask_weight=True,gamma=gamma,is_need_exp=True)\n\n# If you just want to input a mask with weight, set the is_mask_weight: True and is_need_exp:False\ngamma = torch.full((32,),0.5,device='cuda:0')\noutput = causal_linear_decoder(Q,K,V,is_mask_weight=True,gamma=gamma,is_need_exp=False)\n\n# If you want to use a specified methods, such as FleetAttention, set the attn-method: 'FleetAttention'\ngamma = torch.full((32,),0.5,device='cuda:0')\noutput = causal_linear_decoder(Q,K,V,is_mask_weight=False,attn_method='FleetAttention')\n\n```\n\n\n### 3. acknowledgement\n|method|Title|Paper|Code|\n|---|---|---|---|\n|causal_dot_product|Fast Transformers with Clustered Attention|[arxiv](https://arxiv.org/abs/2007.04825) |[code](https://github.com/idiap/fast-transformers/tree/master/fast_transformers/causal_product)|\n|Lighting Attention-2|Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models|[arxiv](https://arxiv.org/abs/2401.04658)|[code](https://github.com/OpenNLPLab/lightning-attention)\n",
    "bugtrack_url": null,
    "license": "MIT Licence",
    "summary": "Efficient computation library for linear attention.",
    "version": "0.0.2",
    "project_urls": {
        "Homepage": "https://github.com/Computational-Machine-Intelligence/efficient_linear_decoding"
    },
    "split_keywords": [
        "pip",
        " leetdecoding",
        " leetdecoding",
        " efficient_linear_decoding"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "47f5cd6b7ac1d285cbf4147a07baa235e873efc159f1ba605ecab4e1791d09e9",
                "md5": "7b0536519714cd83b077e168346edcb7",
                "sha256": "dabfafcd210119037569219ee3327965d6acafdcc4dc6c9dc01160200b7edfa3"
            },
            "downloads": -1,
            "filename": "leetDecoding-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "7b0536519714cd83b077e168346edcb7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 26143,
            "upload_time": "2024-09-18T06:45:25",
            "upload_time_iso_8601": "2024-09-18T06:45:25.064023Z",
            "url": "https://files.pythonhosted.org/packages/47/f5/cd6b7ac1d285cbf4147a07baa235e873efc159f1ba605ecab4e1791d09e9/leetDecoding-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-18 06:45:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Computational-Machine-Intelligence",
    "github_project": "efficient_linear_decoding",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "leetdecoding"
}

Jiaping Wang