differential-transformer


Namedifferential-transformer JSON
Version 0.0.3 PyPI version JSON
download
home_pagehttps://github.com/kyegomez/DifferentialTransformer
Summarydifferential-transformer - Pytorch
upload_time2024-10-12 21:07:27
maintainerNone
docs_urlNone
authorKye Gomez
requires_python<4.0,>=3.10
licenseMIT
keywords artificial intelligence deep learning optimizers prompt engineering
VCS
bugtrack_url
requirements torch zetascale swarms
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# Differential Transformer 

An open source community implementation of the model from "DIFFERENTIAL TRANSFORMER" paper by Microsoft. [Paper Link](https://arxiv.org/abs/2410.05258). "Differential attention takes the difference between two softmax attention functions to eliminate attention noise. The idea is analogous to differential amplifiers [19] proposed in electrical engineering,where the difference between two signals is used as output, so that we can null out the common-mode noise of the input. In addition, the design of noise-canceling headphones is based on a similar idea. We can directly reuse FlashAttention [8] as described in Appendix A, which significantly improves model efficiency."



[![Join our Discord](https://img.shields.io/badge/Discord-Join%20our%20server-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/agora-999382051935506503) [![Subscribe on YouTube](https://img.shields.io/badge/YouTube-Subscribe-red?style=for-the-badge&logo=youtube&logoColor=white)](https://www.youtube.com/@kyegomez3242) [![Connect on LinkedIn](https://img.shields.io/badge/LinkedIn-Connect-blue?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/kye-g-38759a207/) [![Follow on X.com](https://img.shields.io/badge/X.com-Follow-1DA1F2?style=for-the-badge&logo=x&logoColor=white)](https://x.com/kyegomezb)


## Install

```bash
$ pip3 install differential-transformers
```

## Usage Transformer

```python

import torch
from differential_transformer.main import DifferentialTransformer
from loguru import logger

# Example usage:
# Example dimensions
batch_size = 32
seq_len = 128
embedding_dim = 64
h = 8
λ = 0.1
λinit = 0.05

# Create random input tensor
x = torch.randint(0, 256, (1, 1024))

# Instantiate and run the multi-head attention
multi_head = DifferentialTransformer(heads=h, dim=embedding_dim, λinit=λinit)
output = multi_head(x, λ=λ)

logger.info(f"Output shape: {output.shape}")


```

# License
MIT


## Citation


```bibtex
@misc{ye2024differentialtransformer,
    title={Differential Transformer}, 
    author={Tianzhu Ye and Li Dong and Yuqing Xia and Yutao Sun and Yi Zhu and Gao Huang and Furu Wei},
    year={2024},
    eprint={2410.05258},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2410.05258}, 
}

```
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kyegomez/DifferentialTransformer",
    "name": "differential-transformer",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": "artificial intelligence, deep learning, optimizers, Prompt Engineering",
    "author": "Kye Gomez",
    "author_email": "kye@apac.ai",
    "download_url": "https://files.pythonhosted.org/packages/97/ee/2c9081c5b6cbf3a903d4d2f4f832fc87380bef8dff0c5e60bbd1cda753b5/differential_transformer-0.0.3.tar.gz",
    "platform": null,
    "description": "\n# Differential Transformer \n\nAn open source community implementation of the model from \"DIFFERENTIAL TRANSFORMER\" paper by Microsoft. [Paper Link](https://arxiv.org/abs/2410.05258). \"Differential attention takes the difference between two softmax attention functions to eliminate attention noise. The idea is analogous to differential amplifiers [19] proposed in electrical engineering,where the difference between two signals is used as output, so that we can null out the common-mode noise of the input. In addition, the design of noise-canceling headphones is based on a similar idea. We can directly reuse FlashAttention [8] as described in Appendix A, which significantly improves model efficiency.\"\n\n\n\n[![Join our Discord](https://img.shields.io/badge/Discord-Join%20our%20server-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/agora-999382051935506503) [![Subscribe on YouTube](https://img.shields.io/badge/YouTube-Subscribe-red?style=for-the-badge&logo=youtube&logoColor=white)](https://www.youtube.com/@kyegomez3242) [![Connect on LinkedIn](https://img.shields.io/badge/LinkedIn-Connect-blue?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/kye-g-38759a207/) [![Follow on X.com](https://img.shields.io/badge/X.com-Follow-1DA1F2?style=for-the-badge&logo=x&logoColor=white)](https://x.com/kyegomezb)\n\n\n## Install\n\n```bash\n$ pip3 install differential-transformers\n```\n\n## Usage Transformer\n\n```python\n\nimport torch\nfrom differential_transformer.main import DifferentialTransformer\nfrom loguru import logger\n\n# Example usage:\n# Example dimensions\nbatch_size = 32\nseq_len = 128\nembedding_dim = 64\nh = 8\n\u03bb = 0.1\n\u03bbinit = 0.05\n\n# Create random input tensor\nx = torch.randint(0, 256, (1, 1024))\n\n# Instantiate and run the multi-head attention\nmulti_head = DifferentialTransformer(heads=h, dim=embedding_dim, \u03bbinit=\u03bbinit)\noutput = multi_head(x, \u03bb=\u03bb)\n\nlogger.info(f\"Output shape: {output.shape}\")\n\n\n```\n\n# License\nMIT\n\n\n## Citation\n\n\n```bibtex\n@misc{ye2024differentialtransformer,\n    title={Differential Transformer}, \n    author={Tianzhu Ye and Li Dong and Yuqing Xia and Yutao Sun and Yi Zhu and Gao Huang and Furu Wei},\n    year={2024},\n    eprint={2410.05258},\n    archivePrefix={arXiv},\n    primaryClass={cs.CL},\n    url={https://arxiv.org/abs/2410.05258}, \n}\n\n```",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "differential-transformer - Pytorch",
    "version": "0.0.3",
    "project_urls": {
        "Documentation": "https://github.com/kyegomez/DifferentialTransformer",
        "Homepage": "https://github.com/kyegomez/DifferentialTransformer",
        "Repository": "https://github.com/kyegomez/DifferentialTransformer"
    },
    "split_keywords": [
        "artificial intelligence",
        " deep learning",
        " optimizers",
        " prompt engineering"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c9d41eed50fd3298090d975e821df436a4dc572352d1ef978c9d118a4b26edba",
                "md5": "1fa3d1ec48797864b86eb050b52b42b5",
                "sha256": "c99c054b086f6dd668d6dda0b1f66b499e1ad80ad8824a9ca99d11efbf4b5ab2"
            },
            "downloads": -1,
            "filename": "differential_transformer-0.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1fa3d1ec48797864b86eb050b52b42b5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 5811,
            "upload_time": "2024-10-12T21:07:26",
            "upload_time_iso_8601": "2024-10-12T21:07:26.006153Z",
            "url": "https://files.pythonhosted.org/packages/c9/d4/1eed50fd3298090d975e821df436a4dc572352d1ef978c9d118a4b26edba/differential_transformer-0.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "97ee2c9081c5b6cbf3a903d4d2f4f832fc87380bef8dff0c5e60bbd1cda753b5",
                "md5": "f713f28bd886136c55f691ac1e6278e5",
                "sha256": "cb6acb67ad9ee80802c4e9a410de65c483e6bb25f7aa993ae255204b3de141f5"
            },
            "downloads": -1,
            "filename": "differential_transformer-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "f713f28bd886136c55f691ac1e6278e5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 5202,
            "upload_time": "2024-10-12T21:07:27",
            "upload_time_iso_8601": "2024-10-12T21:07:27.540955Z",
            "url": "https://files.pythonhosted.org/packages/97/ee/2c9081c5b6cbf3a903d4d2f4f832fc87380bef8dff0c5e60bbd1cda753b5/differential_transformer-0.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-12 21:07:27",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kyegomez",
    "github_project": "DifferentialTransformer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "torch",
            "specs": []
        },
        {
            "name": "zetascale",
            "specs": []
        },
        {
            "name": "swarms",
            "specs": []
        }
    ],
    "lcname": "differential-transformer"
}
        
Elapsed time: 0.58692s