fast-hadamard-transform


Namefast-hadamard-transform JSON
Version 1.0.4.post1 PyPI version JSON
download
home_pagehttps://github.com/Dao-AILab/fast-hadamard-transform
SummaryFast Hadamard Transform in CUDA, with a PyTorch interface
upload_time2024-02-13 05:49:17
maintainer
docs_urlNone
authorTri Dao
requires_python>=3.7
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Fast Hadamard Transform in CUDA, with a PyTorch interface

Features:
- Support fp32, fp16, bf16, for dimension up to 32768.
- Implicitly pad with zeros if dimension is not a power of 2.

## How to use

```
from fast_hadamard_transform import hadamard_transform
```

```
def hadamard_transform(x, scale=1.0):
    """
    Arguments:
        x: (..., dim)
        scale: float. Multiply the output by this number.
    Returns:
        out: (..., dim)

    Multiply each row of x by the Hadamard transform matrix.
    Equivalent to F.linear(x, torch.tensor(scipy.linalg.hadamard(dim))) * scale.
    If dim is not a power of 2, we implicitly pad x with zero so that dim is the next power of 2.
    """
```

## Speed

Benchmarked on A100, for not too small batch size, compared to memcpy
(torch.clone), which is a lower bound for the time taken as we'd need to read
inputs from GPU memory and write output to GPU memory anyway.

| Data type |  Dimension | Time taken vs memcpy |
| --------- | ---------- | -------------------- |
| fp16/bf16 |     <= 512 |                 1.0x |
|           | 512 - 8192 |              <= 1.2x |
|           |      16384 |                 1.3x |
|           |      32768 |                 1.8x |
| fp32      |    <= 8192 |                 1.0x |
|           |      16384 |                 1.1x |
|           |      32768 |                 1.2x |

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Dao-AILab/fast-hadamard-transform",
    "name": "fast-hadamard-transform",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "",
    "author": "Tri Dao",
    "author_email": "tri@tridao.me",
    "download_url": "https://files.pythonhosted.org/packages/33/99/8690afdcf5caf79736ed8d9c062d92608e2d65402167bc5411b5d4b71853/fast_hadamard_transform-1.0.4.post1.tar.gz",
    "platform": null,
    "description": "# Fast Hadamard Transform in CUDA, with a PyTorch interface\n\nFeatures:\n- Support fp32, fp16, bf16, for dimension up to 32768.\n- Implicitly pad with zeros if dimension is not a power of 2.\n\n## How to use\n\n```\nfrom fast_hadamard_transform import hadamard_transform\n```\n\n```\ndef hadamard_transform(x, scale=1.0):\n    \"\"\"\n    Arguments:\n        x: (..., dim)\n        scale: float. Multiply the output by this number.\n    Returns:\n        out: (..., dim)\n\n    Multiply each row of x by the Hadamard transform matrix.\n    Equivalent to F.linear(x, torch.tensor(scipy.linalg.hadamard(dim))) * scale.\n    If dim is not a power of 2, we implicitly pad x with zero so that dim is the next power of 2.\n    \"\"\"\n```\n\n## Speed\n\nBenchmarked on A100, for not too small batch size, compared to memcpy\n(torch.clone), which is a lower bound for the time taken as we'd need to read\ninputs from GPU memory and write output to GPU memory anyway.\n\n| Data type |  Dimension | Time taken vs memcpy |\n| --------- | ---------- | -------------------- |\n| fp16/bf16 |     <= 512 |                 1.0x |\n|           | 512 - 8192 |              <= 1.2x |\n|           |      16384 |                 1.3x |\n|           |      32768 |                 1.8x |\n| fp32      |    <= 8192 |                 1.0x |\n|           |      16384 |                 1.1x |\n|           |      32768 |                 1.2x |\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Fast Hadamard Transform in CUDA, with a PyTorch interface",
    "version": "1.0.4.post1",
    "project_urls": {
        "Homepage": "https://github.com/Dao-AILab/fast-hadamard-transform"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "33998690afdcf5caf79736ed8d9c062d92608e2d65402167bc5411b5d4b71853",
                "md5": "efb49590e6a7e35c560161899892454e",
                "sha256": "a296eaf72201599b698ff5f924b6cb9d1d4bede3ca0faac3c9de929a30e39168"
            },
            "downloads": -1,
            "filename": "fast_hadamard_transform-1.0.4.post1.tar.gz",
            "has_sig": false,
            "md5_digest": "efb49590e6a7e35c560161899892454e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 6699,
            "upload_time": "2024-02-13T05:49:17",
            "upload_time_iso_8601": "2024-02-13T05:49:17.448664Z",
            "url": "https://files.pythonhosted.org/packages/33/99/8690afdcf5caf79736ed8d9c062d92608e2d65402167bc5411b5d4b71853/fast_hadamard_transform-1.0.4.post1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-13 05:49:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Dao-AILab",
    "github_project": "fast-hadamard-transform",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "fast-hadamard-transform"
}
        
Elapsed time: 0.37790s