selfextend


Nameselfextend JSON
Version 0.0.1 PyPI version JSON
download
home_pagehttps://github.com/kyegomez/SelfExtend
SummarySelfExtendAttn - Pytorch
upload_time2024-01-03 22:30:13
maintainer
docs_urlNone
authorKye Gomez
requires_python>=3.6,<4.0
licenseMIT
keywords artificial intelligence deep learning optimizers prompt engineering
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)

# SelfExtendAttn
Implementation of SelfExtendAttn from the paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" from Pytorch and Zeta. This implementation is based mostly on the pseudocode listed in Algorithm 1 in page 4


# Install
`pip install selfextend`


## Usage
```python
import torch
from se_attn import SelfExtendAttn

# Example usage
dim = 512  # Dimension of model
g_size = 2  # Group size
w_size = 4  # Window size for neighbor tokens
self_extend = SelfExtendAttn(dim, g_size, w_size, qk_norm=True)

# Example tensors for q, k, v, and pos
q = torch.randn(1, 10, dim)
k = torch.randn(1, 10, dim)
v = torch.randn(1, 10, dim)
pos = torch.arange(0, 10).unsqueeze(0)  # Example positional indices

output = self_extend(q, k, v, pos)
print(output)
```

---

## Technical Architecture

### Key Concepts

- **Grouped Attention**: This mechanism divides the input sequence into groups and applies the attention operation within each group. It uses a floor operation to adjust the positions within the groups, enabling efficient handling of longer sequences.
  
- **Normal Attention**: Standard self-attention used in transformers, focusing on nearby tokens within a specified window.

### Attention Mechanism

The `SelfExtendAttn` module integrates these two attention strategies:

1. **Normal Attention** is applied to tokens within a neighborhood window, maintaining precise positional information for closely related tokens.
   
2. **Grouped Attention** is used for tokens outside this neighborhood window. It reduces the granularity of positional information for distant tokens, which is less critical but still contributes to the overall context understanding.

### Merge Strategy

The attention values outside the neighborhood window are replaced by those obtained from the grouped attention. This merging strategy ensures a smooth transition and efficient processing of longer sequences while preserving the essential context captured by the normal attention within the neighborhood window.

### Positional Encoding

Sine and cosine functions generate positional encodings, ensuring that the model retains an understanding of token order and position.

## Implementation Details

- **Module Class**: `SelfExtendAttn` is implemented as a subclass of `nn.Module` in PyTorch.
- **Configurability**: Key parameters such as group size and neighbor window size are configurable.
- **Causal Masking**: Ensures that the attention mechanism respects the autoregressive property of language models.



# Citation
```bibtext
@misc{jin2024llm,
    title={LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning}, 
    author={Hongye Jin and Xiaotian Han and Jingfeng Yang and Zhimeng Jiang and Zirui Liu and Chia-Yuan Chang and Huiyuan Chen and Xia Hu},
    year={2024},
    eprint={2401.01325},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
```

# License
MIT
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kyegomez/SelfExtend",
    "name": "selfextend",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6,<4.0",
    "maintainer_email": "",
    "keywords": "artificial intelligence,deep learning,optimizers,Prompt Engineering",
    "author": "Kye Gomez",
    "author_email": "kye@apac.ai",
    "download_url": "https://files.pythonhosted.org/packages/a1/30/d463aa2670d9703f1fdbdd9ea4baca2f1aa2134894d57b23a56cae27ba07/selfextend-0.0.1.tar.gz",
    "platform": null,
    "description": "[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)\n\n# SelfExtendAttn\nImplementation of SelfExtendAttn from the paper \"LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning\" from Pytorch and Zeta. This implementation is based mostly on the pseudocode listed in Algorithm 1 in page 4\n\n\n# Install\n`pip install selfextend`\n\n\n## Usage\n```python\nimport torch\nfrom se_attn import SelfExtendAttn\n\n# Example usage\ndim = 512  # Dimension of model\ng_size = 2  # Group size\nw_size = 4  # Window size for neighbor tokens\nself_extend = SelfExtendAttn(dim, g_size, w_size, qk_norm=True)\n\n# Example tensors for q, k, v, and pos\nq = torch.randn(1, 10, dim)\nk = torch.randn(1, 10, dim)\nv = torch.randn(1, 10, dim)\npos = torch.arange(0, 10).unsqueeze(0)  # Example positional indices\n\noutput = self_extend(q, k, v, pos)\nprint(output)\n```\n\n---\n\n## Technical Architecture\n\n### Key Concepts\n\n- **Grouped Attention**: This mechanism divides the input sequence into groups and applies the attention operation within each group. It uses a floor operation to adjust the positions within the groups, enabling efficient handling of longer sequences.\n  \n- **Normal Attention**: Standard self-attention used in transformers, focusing on nearby tokens within a specified window.\n\n### Attention Mechanism\n\nThe `SelfExtendAttn` module integrates these two attention strategies:\n\n1. **Normal Attention** is applied to tokens within a neighborhood window, maintaining precise positional information for closely related tokens.\n   \n2. **Grouped Attention** is used for tokens outside this neighborhood window. It reduces the granularity of positional information for distant tokens, which is less critical but still contributes to the overall context understanding.\n\n### Merge Strategy\n\nThe attention values outside the neighborhood window are replaced by those obtained from the grouped attention. This merging strategy ensures a smooth transition and efficient processing of longer sequences while preserving the essential context captured by the normal attention within the neighborhood window.\n\n### Positional Encoding\n\nSine and cosine functions generate positional encodings, ensuring that the model retains an understanding of token order and position.\n\n## Implementation Details\n\n- **Module Class**: `SelfExtendAttn` is implemented as a subclass of `nn.Module` in PyTorch.\n- **Configurability**: Key parameters such as group size and neighbor window size are configurable.\n- **Causal Masking**: Ensures that the attention mechanism respects the autoregressive property of language models.\n\n\n\n# Citation\n```bibtext\n@misc{jin2024llm,\n    title={LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning}, \n    author={Hongye Jin and Xiaotian Han and Jingfeng Yang and Zhimeng Jiang and Zirui Liu and Chia-Yuan Chang and Huiyuan Chen and Xia Hu},\n    year={2024},\n    eprint={2401.01325},\n    archivePrefix={arXiv},\n    primaryClass={cs.CL}\n}\n```\n\n# License\nMIT",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "SelfExtendAttn - Pytorch",
    "version": "0.0.1",
    "project_urls": {
        "Documentation": "https://github.com/kyegomez/SelfExtend",
        "Homepage": "https://github.com/kyegomez/SelfExtend",
        "Repository": "https://github.com/kyegomez/SelfExtend"
    },
    "split_keywords": [
        "artificial intelligence",
        "deep learning",
        "optimizers",
        "prompt engineering"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "be220c098f4e39b1d7904c9707eb584c7a9cd4cd61362b4fc8f5ace604ec070a",
                "md5": "0c18d8c6c830c3ade1fc65a370876659",
                "sha256": "4703ae6e96c626dd5d3f4e95d3b0de625fc97975e719849ce74f5e4063064aa2"
            },
            "downloads": -1,
            "filename": "selfextend-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0c18d8c6c830c3ade1fc65a370876659",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6,<4.0",
            "size": 5235,
            "upload_time": "2024-01-03T22:30:11",
            "upload_time_iso_8601": "2024-01-03T22:30:11.292212Z",
            "url": "https://files.pythonhosted.org/packages/be/22/0c098f4e39b1d7904c9707eb584c7a9cd4cd61362b4fc8f5ace604ec070a/selfextend-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a130d463aa2670d9703f1fdbdd9ea4baca2f1aa2134894d57b23a56cae27ba07",
                "md5": "30a291653c2d977289898e8f0af6c02e",
                "sha256": "14f1598ca15294456935dba5de810315357a9930d94d5b4e185a1e38bbd5f2a0"
            },
            "downloads": -1,
            "filename": "selfextend-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "30a291653c2d977289898e8f0af6c02e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6,<4.0",
            "size": 5328,
            "upload_time": "2024-01-03T22:30:13",
            "upload_time_iso_8601": "2024-01-03T22:30:13.407814Z",
            "url": "https://files.pythonhosted.org/packages/a1/30/d463aa2670d9703f1fdbdd9ea4baca2f1aa2134894d57b23a56cae27ba07/selfextend-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-03 22:30:13",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kyegomez",
    "github_project": "SelfExtend",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "selfextend"
}
        
Elapsed time: 0.17266s