## An efficient Linear Attention Decoding package
### 1. installation
```bash
conda create -n efficient_linear_decoding python=3.9
conda activate efficient_linear_decoding
pip install efficient_linear_decoding
```
The code has been test under the following environment:
```python
triton>=2.1.0
torch>=2.1.0
pycuda
pynvml
numpy<2
```
You can use the following command to install:
```python
pip install triton==2.1.0
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
pip install pycuda
pip install pynvml
pip install numpy
```
### 2. usage
```python
import torch
from efficient_linear_decoding.efficient_linear_decoding import causal_linear_decoder
# Create input tensor
Q = torch.randn(2,32,1024,128,device='cuda:0')
K = torch.randn(2,32,1024,128,device='cuda:0')
V = torch.randn(2,32,1024,128,device='cuda:0')
# Inference using causal_linear_decoder
output = causal_linear_decoder(Q,K,V)
# If you want to input a mask with weight, set the is_mask_weight: True
gamma = torch.full((32,),0.5,device='cuda:0')
output = causal_linear_decoder(Q,K,V,is_mask_weight=True,gamma=gamma)
```
### 3. acknowledgement
|method|Title|Paper|Code|
|---|---|---|---|
|causal_dot_product|Fast Transformers with Clustered Attention|[arxiv](https://arxiv.org/abs/2007.04825) |[code](https://github.com/idiap/fast-transformers/tree/master/fast_transformers/causal_product)|
|Lighting Attention-2|Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models|[arxiv](https://arxiv.org/abs/2401.04658)|[code](https://github.com/OpenNLPLab/lightning-attention)
Raw data
{
"_id": null,
"home_page": "https://github.com/Computational-Machine-Intelligence/efficient_linear_decoding",
"name": "efficient-linear-decoding",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "pip, efficient_linear_decoding",
"author": "Jiaping Wang",
"author_email": "wjp666.s@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/e2/d0/605e90a1ee674a58effefae998bc7dddd6e868f58521a717f05863aac44c/efficient_linear_decoding-0.0.7.tar.gz",
"platform": "any",
"description": "## An efficient Linear Attention Decoding package\n\n### 1. installation\n\n```bash\nconda create -n efficient_linear_decoding python=3.9\nconda activate efficient_linear_decoding\npip install efficient_linear_decoding\n```\n\nThe code has been test under the following environment:\n```python\ntriton>=2.1.0\ntorch>=2.1.0\npycuda\npynvml\nnumpy<2\n```\nYou can use the following command to install:\n```python\npip install triton==2.1.0\npip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118\npip install pycuda\npip install pynvml\npip install numpy\n```\n\n### 2. usage\n\n```python\nimport torch\nfrom efficient_linear_decoding.efficient_linear_decoding import causal_linear_decoder\n\n# Create input tensor\nQ = torch.randn(2,32,1024,128,device='cuda:0')\nK = torch.randn(2,32,1024,128,device='cuda:0')\nV = torch.randn(2,32,1024,128,device='cuda:0')\n\n# Inference using causal_linear_decoder\noutput = causal_linear_decoder(Q,K,V)\n\n# If you want to input a mask with weight, set the is_mask_weight: True\ngamma = torch.full((32,),0.5,device='cuda:0')\noutput = causal_linear_decoder(Q,K,V,is_mask_weight=True,gamma=gamma)\n\n```\n\n\n### 3. acknowledgement\n|method|Title|Paper|Code|\n|---|---|---|---|\n|causal_dot_product|Fast Transformers with Clustered Attention|[arxiv](https://arxiv.org/abs/2007.04825) |[code](https://github.com/idiap/fast-transformers/tree/master/fast_transformers/causal_product)|\n|Lighting Attention-2|Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models|[arxiv](https://arxiv.org/abs/2401.04658)|[code](https://github.com/OpenNLPLab/lightning-attention)\n",
"bugtrack_url": null,
"license": "MIT Licence",
"summary": "Efficient computation library for linear attention.",
"version": "0.0.7",
"project_urls": {
"Homepage": "https://github.com/Computational-Machine-Intelligence/efficient_linear_decoding"
},
"split_keywords": [
"pip",
" efficient_linear_decoding"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e2d0605e90a1ee674a58effefae998bc7dddd6e868f58521a717f05863aac44c",
"md5": "20369b017f3bdb34dfcc33f9d8e9f4bf",
"sha256": "1f2be51fcc25b35b3fadf3847a567391dad63cf065bb5a10138710b6f0cd6d04"
},
"downloads": -1,
"filename": "efficient_linear_decoding-0.0.7.tar.gz",
"has_sig": false,
"md5_digest": "20369b017f3bdb34dfcc33f9d8e9f4bf",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 24822,
"upload_time": "2024-08-25T03:49:18",
"upload_time_iso_8601": "2024-08-25T03:49:18.987980Z",
"url": "https://files.pythonhosted.org/packages/e2/d0/605e90a1ee674a58effefae998bc7dddd6e868f58521a717f05863aac44c/efficient_linear_decoding-0.0.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-25 03:49:18",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Computational-Machine-Intelligence",
"github_project": "efficient_linear_decoding",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "efficient-linear-decoding"
}