deep-nccl-wrapper

Name	deep-nccl-wrapper JSON
Version	1.0.2 JSON
	download
home_page	https://help.aliyun.com/document_detail/462422.html?spm=a2c4g.462031.0.0.c5f96b4drcx52F
Summary	Deep-NCCL is an AI-Accelerator communication framework for NVIDIA-NCCL. It implements optimized all-reduce, all-gather, reduce, broadcast, reduce-scatter, all-to-all,as well as any send/receive based communication pattern.It has been optimized to achieve high bandwidth on aliyun machines using PCIe, NVLink, NVswitch,as well as networking using InfiniBand Verbs, eRDMA or TCP/IP sockets.
upload_time	2023-11-09 08:11:48
maintainer
docs_url	None
author	Alibaba Cloud
requires_python	>=3.0
license	Copyright (C) Alibaba Group Holding Limited
keywords	distributed deep learning communication nccl aiacc deepnccl
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Deep-NCCL-Wrapper

Deep-NCCL-Wrapper is a wrapper for [DeepNCCL](https://pypi.org/project/deepnccl/) which Optimized primitives for inter-GPU communication on Aliyun machines.

## Introduction

Deep-NCCL is an AI-Accelerator communication framework for NVIDIA-NCCL.
It implements optimized all-reduce, all-gather, reduce, broadcast, reduce-scatter, all-to-all, as well as any send/receive based communication pattern.
It has been optimized to achieve high bandwidth on aliyun machines using PCIe, NVLink, NVswitch, as well as networking using InfiniBand Verbs, eRDMA or TCP/IP sockets.

## Install

To install Deep NCCL on the system, create a package then install it as root as follow two methods:

- method1: rpm/deb (Recommended)
```sh
# Centos:
wget https://aiacc.oss-accelerate.aliyuncs.com/nccl/rpm/deep-nccl-2.0.1.rpm
rpm -i deep-nccl-2.0.1.rpm
# Ubuntu:
wget https://aiacc.oss-accelerate.aliyuncs.com/nccl/deb/deep-nccl-2.0.1.deb
dpkg -i deep-nccl-2.0.1.deb
```
- method2: python-pypi
```sh
pip install deep-nccl-wrapper
```

## Usage

After install deep-nccl package, you need do nothing to change code!


## Environment

* ***AIACC_FASTTUNING***: Enable Fasttuning for LLMs, default=1 is to enable.
* ***NCCL_AIACC_ALLREDUCE_DISABLE***: Disable allreduce algo, default=0 is to enable.
* ***NCCL_AIACC_ALLGATHER_DISABLE***: Disable allgather algo, default=0 is to enable.
* ***NCCL_AIACC_REDUCE_SCATTER_DISABLE***: Disable reduce_scatter algo, default=0 is to enable.
* ***AIACC_UPDATE_ALGO_DISABLE***: Disable update aiacc nccl algo from aiacc-sql-server, default=0 is to enable.

## Performance

Deep-NCCL can speedup the nccl performance on aliyun EGS(GPU machine), for example instance type 'ecs.ebmgn7ex.32xlarge' is A100 x 8 GPU and using network eRdma.

| GPU(EGS)    | Collective     | Nodes   | Network   | Speedup(nccl-tests) |
|-------------|----------------|---------|-----------|---------------------|
| A100 x 8    | all_gather     | 2-10    | VPC/eRdma | 30%+                |
| A100 x 8    | reduce_scatter | 2-10    | VPC/eRdma | 30%+                |
| A100 x 8    | all_reduce     | 2-10    | VPC/eRdma | 20%                 |
| V100 x 8    | all_reduce     | 2-20    | VPC       | 60%+                |
| A10  x 8    | all_reduce     | 1       | -         | 20%                 |


## Copyright

All source code and accompanying documentation is copyright (c) 2015-2020, NVIDIA CORPORATION. All rights reserved.
All modifications are copyright (c) 2020-2024, ALIYUN CORPORATION. All rights reserved.

Raw data

            {
    "_id": null,
    "home_page": "https://help.aliyun.com/document_detail/462422.html?spm=a2c4g.462031.0.0.c5f96b4drcx52F",
    "name": "deep-nccl-wrapper",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.0",
    "maintainer_email": "",
    "keywords": "Distributed,Deep Learning,Communication,NCCL,AIACC,DEEPNCCL",
    "author": "Alibaba Cloud",
    "author_email": "ziqi.yzq@alibaba-inc.com",
    "download_url": "https://files.pythonhosted.org/packages/9a/78/a8b40add9cec1e73a75cdea0ee593d50eb292b1ef87d190078bb01a9dbff/deep_nccl_wrapper-1.0.2.tar.gz",
    "platform": null,
    "description": "# Deep-NCCL-Wrapper\n\nDeep-NCCL-Wrapper is a wrapper for [DeepNCCL](https://pypi.org/project/deepnccl/) which Optimized primitives for inter-GPU communication on Aliyun machines.\n\n## Introduction\n\nDeep-NCCL is an AI-Accelerator communication framework for NVIDIA-NCCL.\nIt implements optimized all-reduce, all-gather, reduce, broadcast, reduce-scatter, all-to-all, as well as any send/receive based communication pattern.\nIt has been optimized to achieve high bandwidth on aliyun machines using PCIe, NVLink, NVswitch, as well as networking using InfiniBand Verbs, eRDMA or TCP/IP sockets.\n\n## Install\n\nTo install Deep NCCL on the system, create a package then install it as root as follow two methods:\n\n- method1: rpm/deb (Recommended)\n```sh\n# Centos:\nwget https://aiacc.oss-accelerate.aliyuncs.com/nccl/rpm/deep-nccl-2.0.1.rpm\nrpm -i deep-nccl-2.0.1.rpm\n# Ubuntu:\nwget https://aiacc.oss-accelerate.aliyuncs.com/nccl/deb/deep-nccl-2.0.1.deb\ndpkg -i deep-nccl-2.0.1.deb\n```\n- method2: python-pypi\n```sh\npip install deep-nccl-wrapper\n```\n\n## Usage\n\nAfter install deep-nccl package, you need do nothing to change code!\n\n\n## Environment\n\n* ***AIACC_FASTTUNING***: Enable Fasttuning for LLMs, default=1 is to enable.\n* ***NCCL_AIACC_ALLREDUCE_DISABLE***: Disable allreduce algo, default=0 is to enable.\n* ***NCCL_AIACC_ALLGATHER_DISABLE***: Disable allgather algo, default=0 is to enable.\n* ***NCCL_AIACC_REDUCE_SCATTER_DISABLE***: Disable reduce_scatter algo, default=0 is to enable.\n* ***AIACC_UPDATE_ALGO_DISABLE***: Disable update aiacc nccl algo from aiacc-sql-server, default=0 is to enable.\n\n## Performance\n\nDeep-NCCL can speedup the nccl performance on aliyun EGS(GPU machine), for example instance type 'ecs.ebmgn7ex.32xlarge' is A100 x 8 GPU and using network eRdma.\n\n| GPU(EGS)    | Collective     | Nodes   | Network   | Speedup(nccl-tests) |\n|-------------|----------------|---------|-----------|---------------------|\n| A100 x 8    | all_gather     | 2-10    | VPC/eRdma | 30%+                |\n| A100 x 8    | reduce_scatter | 2-10    | VPC/eRdma | 30%+                |\n| A100 x 8    | all_reduce     | 2-10    | VPC/eRdma | 20%                 |\n| V100 x 8    | all_reduce     | 2-20    | VPC       | 60%+                |\n| A10  x 8    | all_reduce     | 1       | -         | 20%                 |\n\n\n## Copyright\n\nAll source code and accompanying documentation is copyright (c) 2015-2020, NVIDIA CORPORATION. All rights reserved.\nAll modifications are copyright (c) 2020-2024, ALIYUN CORPORATION. All rights reserved.\n\n\n",
    "bugtrack_url": null,
    "license": "Copyright (C) Alibaba Group Holding Limited",
    "summary": "Deep-NCCL is an AI-Accelerator communication framework for NVIDIA-NCCL. It implements optimized all-reduce, all-gather, reduce, broadcast, reduce-scatter, all-to-all,as well as any send/receive based communication pattern.It has been optimized to achieve high bandwidth on aliyun machines using PCIe, NVLink, NVswitch,as well as networking using InfiniBand Verbs, eRDMA or TCP/IP sockets.",
    "version": "1.0.2",
    "project_urls": {
        "Homepage": "https://help.aliyun.com/document_detail/462422.html?spm=a2c4g.462031.0.0.c5f96b4drcx52F"
    },
    "split_keywords": [
        "distributed",
        "deep learning",
        "communication",
        "nccl",
        "aiacc",
        "deepnccl"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9a78a8b40add9cec1e73a75cdea0ee593d50eb292b1ef87d190078bb01a9dbff",
                "md5": "8e4908b76360e97e33ef79f67b96e8fe",
                "sha256": "33df29b698ca222f3ec116e67461e28537caf0db654f5c54fbd302b190ef9794"
            },
            "downloads": -1,
            "filename": "deep_nccl_wrapper-1.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "8e4908b76360e97e33ef79f67b96e8fe",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.0",
            "size": 2909,
            "upload_time": "2023-11-09T08:11:48",
            "upload_time_iso_8601": "2023-11-09T08:11:48.749423Z",
            "url": "https://files.pythonhosted.org/packages/9a/78/a8b40add9cec1e73a75cdea0ee593d50eb292b1ef87d190078bb01a9dbff/deep_nccl_wrapper-1.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-09 08:11:48",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "deep-nccl-wrapper"
}

Alibaba Cloud