nvidia-cusparselt-cu13


Namenvidia-cusparselt-cu13 JSON
Version 0.8.0 PyPI version JSON
download
home_pagehttps://developer.nvidia.com/cusparselt
SummaryNVIDIA cuSPARSELt
upload_time2025-08-13 19:22:40
maintainerNone
docs_urlNone
authorNVIDIA Corporation
requires_pythonNone
licenseNVIDIA Proprietary Software
keywords cuda nvidia machine learning high-performance computing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ###################################################################################
cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication
###################################################################################

**NVIDIA cuSPARSELt** is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:

.. math::

   D = Activation(\alpha op(A) \cdot op(B) + \beta op(C) + bias)

where :math:`op(A)/op(B)` refers to in-place operations such as transpose/non-transpose, and :math:`alpha, beta` are scalars or vectors.

The *cuSPARSELt APIs* allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.

**Download:** `developer.nvidia.com/cusparselt/downloads <https://developer.nvidia.com/cusparselt/downloads>`_

**Provide Feedback:** `Math-Libs-Feedback@nvidia.com <mailto:Math-Libs-Feedback@nvidia.com?subject=cuSPARSELt-Feedback>`_

**Examples**:
`cuSPARSELt Example 1 <https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSELt/matmul>`_,
`cuSPARSELt Example 2 <https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSELt/matmul_advanced>`_

**Blog post**:

- `Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt <https://developer.nvidia.com/blog/exploiting-ampere-structured-sparsity-with-cusparselt/>`_
- `Structured Sparsity in the NVIDIA Ampere Architecture and Applications in Search Engines <https://developer.nvidia.com/blog/structured-sparsity-in-the-nvidia-ampere-architecture-and-applications-in-search-engines/>`__
- `Making the Most of Structured Sparsity in the NVIDIA Ampere Architecture <https://www.nvidia.com/en-us/on-demand/session/gtcspring21-s31552/>`__

================================================================================
Key Features
================================================================================

* *NVIDIA Sparse MMA tensor core* support
* Mixed-precision computation support:

    +--------------+----------------+-----------------+-------------+---------------------------------+--------------------+
    | Input A/B    | Input C        | Output D        | Compute     | Block scaled                    | Support SM arch    |
    +==============+================+=================+=============+=================================+====================+
    | `FP32`       | `FP32`         | `FP32`          | `FP32`      | No                              |                    |
    +--------------+----------------+-----------------+-------------+                                 +                    |
    | `BF16`       | `BF16`         | `BF16`          | `FP32`      |                                 | `8.0, 8.6, 8.7`    |
    +--------------+----------------+-----------------+-------------+                                 + `9.0, 10.0, 10.1`  |
    | `FP16`       | `FP16`         | `FP16`          | `FP32`      |                                 | `11.0, 12.0, 12.1` |
    +--------------+----------------+-----------------+-------------+---------------------------------+--------------------+
    | `FP16`       | `FP16`         | `FP16`          | `FP16`      | No                              | `9.0`              |
    +--------------+----------------+-----------------+-------------+---------------------------------+--------------------+
    | `INT8`       | `INT8`         | `INT8`          | `INT32`     | No                              |                    |
    +              +----------------+-----------------+             +                                 + `8.0, 8.6, 8.7`    +
    |              | `INT32`        | `INT32`         |             |                                 | `9.0, 10.0, 10.1`  |
    +              +----------------+-----------------+             +                                 + `11.0, 12.0, 12.1` +
    |              | `FP16`         | `FP16`          |             |                                 |                    |
    +              +----------------+-----------------+             +                                 +                    +
    |              | `BF16`         | `BF16`          |             |                                 |                    |
    +--------------+----------------+-----------------+-------------+---------------------------------+--------------------+
    | `INT8`       | `INT8`         | `INT8`          | `INT32`     | No                              |                    |
    +              +----------------+-----------------+             +                                 + `8.0, 8.6, 8.7`    +
    |              | `INT32`        | `INT32`         |             |                                 | `9.0, 10.0, 10.1`  |
    +              +----------------+-----------------+             +                                 + `11.0, 12.0, 12.1` +
    |              | `FP16`         | `FP16`          |             |                                 |                    |
    +              +----------------+-----------------+             +                                 +                    +
    |              | `BF16`         | `BF16`          |             |                                 |                    |
    +--------------+----------------+-----------------+-------------+---------------------------------+--------------------+
    | `E4M3`       | `FP16`         | `E4M3`          | `FP32`      | No                              | `9.0, 10.0, 10.1`  |
    +              +----------------+-----------------+             +                                 + `11.0, 12.0, 12.1` +
    |              | `BF16`         | `E4M3`          |             |                                 |                    |
    +              +----------------+-----------------+             +                                 +                    +
    |              | `FP16`         | `FP16`          |             |                                 |                    |
    +              +----------------+-----------------+             +                                 +                    +
    |              | `BF16`         | `BF16`          |             |                                 |                    |
    +              +----------------+-----------------+             +                                 +                    +
    |              | `FP32`         | `FP32`          |             |                                 |                    |
    +--------------+----------------+-----------------+-------------+---------------------------------+--------------------+
    | `E5M2`       | `FP16`         | `E5M2`          | `FP32`      | No                              | `9.0, 10.0, 10.1`  |
    +              +----------------+-----------------+             +                                 + `11.0, 12.0, 12.1` +
    |              | `BF16`         | `E5M2`          |             |                                 |                    |
    +              +----------------+-----------------+             +                                 +                    +
    |              | `FP16`         | `FP16`          |             |                                 |                    |
    +              +----------------+-----------------+             +                                 +                    +
    |              | `BF16`         | `BF16`          |             |                                 |                    |
    +              +----------------+-----------------+             +                                 +                    +
    |              | `FP32`         | `FP32`          |             |                                 |                    |
    +--------------+----------------+-----------------+-------------+---------------------------------+--------------------+
    | `E4M3`       | `FP16`         | `E4M3`          | `FP32`      | A/B/D_OUT_SCALE = `VEC64_UE8M0` | `10.0, 10.1, 11.0` |
    +              +----------------+-----------------+             +                                 + `12.0, 12.1`       +
    |              | `BF16`         | `E4M3`          |             | D_SCALE = `32F`                 |                    |
    +              +----------------+-----------------+             +---------------------------------+                    +
    |              | `FP16`         | `FP16`          |             | A/B_SCALE = `VEC64_UE8M0`       |                    |
    +              +----------------+-----------------+             +                                 +                    +
    |              | `BF16`         | `BF16`          |             |                                 |                    |
    +              +----------------+-----------------+             +                                 +                    +
    |              | `FP32`         | `FP32`          |             |                                 |                    |
    +--------------+----------------+-----------------+-------------+---------------------------------+--------------------+
    | `E2M1`       | `FP16`         | `E2M1`          | `FP32`      | A/B/D_SCALE = `VEC32_UE4M3`     | `10.0, 10.1, 11.0` |
    +              +----------------+-----------------+             +                                 + `12.0, 12.1`       +
    |              | `BF16`         | `E2M1`          |             | D_SCALE = `32F`                 |                    |
    +              +----------------+-----------------+             +---------------------------------+                    +
    |              | `FP16`         | `FP16`          |             | A/B_SCALE = `VEC32_UE4M3`       |                    |
    +              +----------------+-----------------+             +                                 +                    +
    |              | `BF16`         | `BF16`          |             |                                 |                    |
    +              +----------------+-----------------+             +                                 +                    +
    |              | `FP32`         | `FP32`          |             |                                 |                    |
    +--------------+----------------+-----------------+-------------+---------------------------------+--------------------+

* Matrix pruning and compression functionalities
* Activation functions, bias vector, and output scaling
* Batched computation (multiple matrices in a single run)
* GEMM Split-K mode
* Auto-tuning functionality (see `cusparseLtMatmulSearch()`)
* NVTX ranging and Logging functionalities

================================================================================
Support
================================================================================

* *Supported SM Architectures*: `SM 8.0`, `SM 8.6`, `SM 8.7`, `SM 8.9`, `SM 9.0`, `SM 10.0`, `SM 10.1` (for CTK 12), `SM 11.0` (for CTK 13), `SM 12.0`, `SM 12.1`
* *Supported CPU architectures and operating systems*:

+------------+--------------------+
| OS         | CPU archs          |
+============+====================+
| `Windows`  | `x86_64`           |
+------------+--------------------+
| `Linux`    | `x86_64`, `Arm64`  |
+------------+--------------------+

================================================================================
Documentation
================================================================================

Please refer to https://docs.nvidia.com/cuda/cusparselt/index.html for the cuSPARSELt documentation.

================================================================================
Installation
================================================================================

The cuSPARSELt wheel can be installed as follows:

.. code-block:: bash

   pip install nvidia-cusparselt-cuXX

where XX is the CUDA major version.

            

Raw data

            {
    "_id": null,
    "home_page": "https://developer.nvidia.com/cusparselt",
    "name": "nvidia-cusparselt-cu13",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "cuda, nvidia, machine learning, high-performance computing",
    "author": "NVIDIA Corporation",
    "author_email": "cuda_installer@nvidia.com",
    "download_url": null,
    "platform": null,
    "description": "###################################################################################\ncuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication\n###################################################################################\n\n**NVIDIA cuSPARSELt** is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:\n\n.. math::\n\n   D = Activation(\\alpha op(A) \\cdot op(B) + \\beta op(C) + bias)\n\nwhere :math:`op(A)/op(B)` refers to in-place operations such as transpose/non-transpose, and :math:`alpha, beta` are scalars or vectors.\n\nThe *cuSPARSELt APIs* allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.\n\n**Download:** `developer.nvidia.com/cusparselt/downloads <https://developer.nvidia.com/cusparselt/downloads>`_\n\n**Provide Feedback:** `Math-Libs-Feedback@nvidia.com <mailto:Math-Libs-Feedback@nvidia.com?subject=cuSPARSELt-Feedback>`_\n\n**Examples**:\n`cuSPARSELt Example 1 <https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSELt/matmul>`_,\n`cuSPARSELt Example 2 <https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSELt/matmul_advanced>`_\n\n**Blog post**:\n\n- `Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt <https://developer.nvidia.com/blog/exploiting-ampere-structured-sparsity-with-cusparselt/>`_\n- `Structured Sparsity in the NVIDIA Ampere Architecture and Applications in Search Engines <https://developer.nvidia.com/blog/structured-sparsity-in-the-nvidia-ampere-architecture-and-applications-in-search-engines/>`__\n- `Making the Most of Structured Sparsity in the NVIDIA Ampere Architecture <https://www.nvidia.com/en-us/on-demand/session/gtcspring21-s31552/>`__\n\n================================================================================\nKey Features\n================================================================================\n\n* *NVIDIA Sparse MMA tensor core* support\n* Mixed-precision computation support:\n\n    +--------------+----------------+-----------------+-------------+---------------------------------+--------------------+\n    | Input A/B    | Input C        | Output D        | Compute     | Block scaled                    | Support SM arch    |\n    +==============+================+=================+=============+=================================+====================+\n    | `FP32`       | `FP32`         | `FP32`          | `FP32`      | No                              |                    |\n    +--------------+----------------+-----------------+-------------+                                 +                    |\n    | `BF16`       | `BF16`         | `BF16`          | `FP32`      |                                 | `8.0, 8.6, 8.7`    |\n    +--------------+----------------+-----------------+-------------+                                 + `9.0, 10.0, 10.1`  |\n    | `FP16`       | `FP16`         | `FP16`          | `FP32`      |                                 | `11.0, 12.0, 12.1` |\n    +--------------+----------------+-----------------+-------------+---------------------------------+--------------------+\n    | `FP16`       | `FP16`         | `FP16`          | `FP16`      | No                              | `9.0`              |\n    +--------------+----------------+-----------------+-------------+---------------------------------+--------------------+\n    | `INT8`       | `INT8`         | `INT8`          | `INT32`     | No                              |                    |\n    +              +----------------+-----------------+             +                                 + `8.0, 8.6, 8.7`    +\n    |              | `INT32`        | `INT32`         |             |                                 | `9.0, 10.0, 10.1`  |\n    +              +----------------+-----------------+             +                                 + `11.0, 12.0, 12.1` +\n    |              | `FP16`         | `FP16`          |             |                                 |                    |\n    +              +----------------+-----------------+             +                                 +                    +\n    |              | `BF16`         | `BF16`          |             |                                 |                    |\n    +--------------+----------------+-----------------+-------------+---------------------------------+--------------------+\n    | `INT8`       | `INT8`         | `INT8`          | `INT32`     | No                              |                    |\n    +              +----------------+-----------------+             +                                 + `8.0, 8.6, 8.7`    +\n    |              | `INT32`        | `INT32`         |             |                                 | `9.0, 10.0, 10.1`  |\n    +              +----------------+-----------------+             +                                 + `11.0, 12.0, 12.1` +\n    |              | `FP16`         | `FP16`          |             |                                 |                    |\n    +              +----------------+-----------------+             +                                 +                    +\n    |              | `BF16`         | `BF16`          |             |                                 |                    |\n    +--------------+----------------+-----------------+-------------+---------------------------------+--------------------+\n    | `E4M3`       | `FP16`         | `E4M3`          | `FP32`      | No                              | `9.0, 10.0, 10.1`  |\n    +              +----------------+-----------------+             +                                 + `11.0, 12.0, 12.1` +\n    |              | `BF16`         | `E4M3`          |             |                                 |                    |\n    +              +----------------+-----------------+             +                                 +                    +\n    |              | `FP16`         | `FP16`          |             |                                 |                    |\n    +              +----------------+-----------------+             +                                 +                    +\n    |              | `BF16`         | `BF16`          |             |                                 |                    |\n    +              +----------------+-----------------+             +                                 +                    +\n    |              | `FP32`         | `FP32`          |             |                                 |                    |\n    +--------------+----------------+-----------------+-------------+---------------------------------+--------------------+\n    | `E5M2`       | `FP16`         | `E5M2`          | `FP32`      | No                              | `9.0, 10.0, 10.1`  |\n    +              +----------------+-----------------+             +                                 + `11.0, 12.0, 12.1` +\n    |              | `BF16`         | `E5M2`          |             |                                 |                    |\n    +              +----------------+-----------------+             +                                 +                    +\n    |              | `FP16`         | `FP16`          |             |                                 |                    |\n    +              +----------------+-----------------+             +                                 +                    +\n    |              | `BF16`         | `BF16`          |             |                                 |                    |\n    +              +----------------+-----------------+             +                                 +                    +\n    |              | `FP32`         | `FP32`          |             |                                 |                    |\n    +--------------+----------------+-----------------+-------------+---------------------------------+--------------------+\n    | `E4M3`       | `FP16`         | `E4M3`          | `FP32`      | A/B/D_OUT_SCALE = `VEC64_UE8M0` | `10.0, 10.1, 11.0` |\n    +              +----------------+-----------------+             +                                 + `12.0, 12.1`       +\n    |              | `BF16`         | `E4M3`          |             | D_SCALE = `32F`                 |                    |\n    +              +----------------+-----------------+             +---------------------------------+                    +\n    |              | `FP16`         | `FP16`          |             | A/B_SCALE = `VEC64_UE8M0`       |                    |\n    +              +----------------+-----------------+             +                                 +                    +\n    |              | `BF16`         | `BF16`          |             |                                 |                    |\n    +              +----------------+-----------------+             +                                 +                    +\n    |              | `FP32`         | `FP32`          |             |                                 |                    |\n    +--------------+----------------+-----------------+-------------+---------------------------------+--------------------+\n    | `E2M1`       | `FP16`         | `E2M1`          | `FP32`      | A/B/D_SCALE = `VEC32_UE4M3`     | `10.0, 10.1, 11.0` |\n    +              +----------------+-----------------+             +                                 + `12.0, 12.1`       +\n    |              | `BF16`         | `E2M1`          |             | D_SCALE = `32F`                 |                    |\n    +              +----------------+-----------------+             +---------------------------------+                    +\n    |              | `FP16`         | `FP16`          |             | A/B_SCALE = `VEC32_UE4M3`       |                    |\n    +              +----------------+-----------------+             +                                 +                    +\n    |              | `BF16`         | `BF16`          |             |                                 |                    |\n    +              +----------------+-----------------+             +                                 +                    +\n    |              | `FP32`         | `FP32`          |             |                                 |                    |\n    +--------------+----------------+-----------------+-------------+---------------------------------+--------------------+\n\n* Matrix pruning and compression functionalities\n* Activation functions, bias vector, and output scaling\n* Batched computation (multiple matrices in a single run)\n* GEMM Split-K mode\n* Auto-tuning functionality (see `cusparseLtMatmulSearch()`)\n* NVTX ranging and Logging functionalities\n\n================================================================================\nSupport\n================================================================================\n\n* *Supported SM Architectures*: `SM 8.0`, `SM 8.6`, `SM 8.7`, `SM 8.9`, `SM 9.0`, `SM 10.0`, `SM 10.1` (for CTK 12), `SM 11.0` (for CTK 13), `SM 12.0`, `SM 12.1`\n* *Supported CPU architectures and operating systems*:\n\n+------------+--------------------+\n| OS         | CPU archs          |\n+============+====================+\n| `Windows`  | `x86_64`           |\n+------------+--------------------+\n| `Linux`    | `x86_64`, `Arm64`  |\n+------------+--------------------+\n\n================================================================================\nDocumentation\n================================================================================\n\nPlease refer to https://docs.nvidia.com/cuda/cusparselt/index.html for the cuSPARSELt documentation.\n\n================================================================================\nInstallation\n================================================================================\n\nThe cuSPARSELt wheel can be installed as follows:\n\n.. code-block:: bash\n\n   pip install nvidia-cusparselt-cuXX\n\nwhere XX is the CUDA major version.\n",
    "bugtrack_url": null,
    "license": "NVIDIA Proprietary Software",
    "summary": "NVIDIA cuSPARSELt",
    "version": "0.8.0",
    "project_urls": {
        "Homepage": "https://developer.nvidia.com/cusparselt"
    },
    "split_keywords": [
        "cuda",
        " nvidia",
        " machine learning",
        " high-performance computing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "46108dcd1175260706a2fc92a16a52e306b71d4c1ea0b0cc4a9484183399818a",
                "md5": "79d38e286ffa58e34092596e194b7041",
                "sha256": "400c6ed1cf6780fc6efedd64ec9f1345871767e6a1a0a552a1ea0578117ea77c"
            },
            "downloads": -1,
            "filename": "nvidia_cusparselt_cu13-0.8.0-py3-none-manylinux2014_aarch64.whl",
            "has_sig": false,
            "md5_digest": "79d38e286ffa58e34092596e194b7041",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 220791277,
            "upload_time": "2025-08-13T19:22:40",
            "upload_time_iso_8601": "2025-08-13T19:22:40.982747Z",
            "url": "https://files.pythonhosted.org/packages/46/10/8dcd1175260706a2fc92a16a52e306b71d4c1ea0b0cc4a9484183399818a/nvidia_cusparselt_cu13-0.8.0-py3-none-manylinux2014_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fd5343b0d71f4e702fa9733f8b4571fdca50a8813f1e450b656c239beff12315",
                "md5": "e92d8012cf03ef5145ed038aea20a5d9",
                "sha256": "25e30a8a7323935d4ad0340b95a0b69926eee755767e8e0b1cf8dd85b197d3fd"
            },
            "downloads": -1,
            "filename": "nvidia_cusparselt_cu13-0.8.0-py3-none-manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "e92d8012cf03ef5145ed038aea20a5d9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 169884119,
            "upload_time": "2025-08-13T19:23:41",
            "upload_time_iso_8601": "2025-08-13T19:23:41.967345Z",
            "url": "https://files.pythonhosted.org/packages/fd/53/43b0d71f4e702fa9733f8b4571fdca50a8813f1e450b656c239beff12315/nvidia_cusparselt_cu13-0.8.0-py3-none-manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "57de8f0578928b9b1246d7b1324db0528e6b9f9fb54496a49f40bf71f09f1a27",
                "md5": "1780f2a5a6b1a1fde101ce99838b63c9",
                "sha256": "e80212ed7b1afc97102fbb2b5c82487aa73f6a0edfa6d26c5a152593e520bb8f"
            },
            "downloads": -1,
            "filename": "nvidia_cusparselt_cu13-0.8.0-py3-none-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "1780f2a5a6b1a1fde101ce99838b63c9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 156459710,
            "upload_time": "2025-08-13T19:24:18",
            "upload_time_iso_8601": "2025-08-13T19:24:18.043563Z",
            "url": "https://files.pythonhosted.org/packages/57/de/8f0578928b9b1246d7b1324db0528e6b9f9fb54496a49f40bf71f09f1a27/nvidia_cusparselt_cu13-0.8.0-py3-none-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-13 19:22:40",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "nvidia-cusparselt-cu13"
}
        
Elapsed time: 1.05351s