Name | deepnccl JSON |
Version |
1.0.6
JSON |
| download |
home_page | https://help.aliyun.com/document_detail/462422.html?spm=a2c4g.462031.0.0.c5f96b4drcx52F |
Summary | AIACC-NCCL is an AI-Accelerator communication framework for NVIDIA-NCCL. It implements optimized all-reduce, all-gather, reduce, broadcast, reduce-scatter, all-to-all,as well as any send/receive based communication pattern.It has been optimized to achieve high bandwidth on aliyun machines using PCIe, NVLink, NVswitch,as well as networking using InfiniBand Verbs, eRDMA or TCP/IP sockets. |
upload_time | 2023-11-09 07:59:07 |
maintainer | |
docs_url | None |
author | Alibaba Cloud |
requires_python | >=3.0 |
license | Copyright (C) Alibaba Group Holding Limited |
keywords |
distributed
deep learning
communication
nccl
aiacc
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Deep-NCCL
Optimized primitives for inter-GPU communication on Aliyun machines.
## Introduction
Deep-NCCL is an AI-Accelerator communication framework for NVIDIA-NCCL.
It implements optimized all-reduce, all-gather, reduce, broadcast, reduce-scatter, all-to-all, as well as any send/receive based communication pattern.
It has been optimized to achieve high bandwidth on aliyun machines using PCIe, NVLink, NVswitch, as well as networking using InfiniBand Verbs, eRDMA or TCP/IP sockets.
## Install
To install Deep NCCL on the system, create a package then install it as root as follow two methods:
- method1: rpm/deb (Recommended)
```sh
# Centos:
wget http://mirrors.aliyun.com/aiacc/aiacc-nccl/aiacc_nccl-1.0.rpm
rpm -i aiacc-nccl-1.0.rpm
# Ubuntu:
wget http://mirrors.aliyun.com/aiacc/aiacc-nccl/aiacc_nccl-1.0.deb
dpkg -i aiacc-nccl-1.0.deb
```
- method2: python-offline
```sh
wget http://mirrors.aliyun.com/aiacc/aiacc-nccl/aiacc_nccl-2.0.0.tar.gz
pip install aiacc_nccl-2.0.0.tar.gz
# notes: must download and then pip install, cannot merge in oneline `pip install aiacc_xxx_url`
# Both method1 and method2 can run concurrently.
```
- method3: python-pypi
```sh
pip install aiacc_nccl==2.0
```
## Usage
After install aiacc-nccl package, you need do nothing to change code!
## Environment
* ***AIACC_FASTTUNING***: Enable Fasttuning for LLMs, default=1 is to enable.
* ***NCCL_AIACC_ALLREDUCE_DISABLE***: Disable allreduce algo, default=0 is to enable.
* ***NCCL_AIACC_ALLGATHER_DISABLE***: Disable allgather algo, default=0 is to enable.
* ***NCCL_AIACC_REDUCE_SCATTER_DISABLE***: Disable reduce_scatter algo, default=0 is to enable.
* ***AIACC_UPDATE_ALGO_DISABLE***: Disable update aiacc nccl algo from aiacc-sql-server, default=0 is to enable.
## Performance
Deep-NCCL can speedup the nccl performance on aliyun EGS(GPU machine), for example instance type 'ecs.ebmgn7ex.32xlarge' is A100 x 8 GPU and using network eRdma.
| GPU(EGS) | Collective | Nodes | Network | Speedup(nccl-tests) |
|-------------|----------------|---------|-----------|---------------------|
| A100 x 8 | all_gather | 2-10 | VPC/eRdma | 30%+ |
| A100 x 8 | reduce_scatter | 2-10 | VPC/eRdma | 30%+ |
| A100 x 8 | all_reduce | 2-10 | VPC/eRdma | 20% |
| V100 x 8 | all_reduce | 2-20 | VPC | 60%+ |
| A10 x 8 | all_reduce | 1 | - | 20% |
## Copyright
All source code and accompanying documentation is copyright (c) 2015-2020, NVIDIA CORPORATION. All rights reserved.
All modifications are copyright (c) 2020-2024, ALIYUN CORPORATION. All rights reserved.
Raw data
{
"_id": null,
"home_page": "https://help.aliyun.com/document_detail/462422.html?spm=a2c4g.462031.0.0.c5f96b4drcx52F",
"name": "deepnccl",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.0",
"maintainer_email": "",
"keywords": "Distributed,Deep Learning,Communication,NCCL,AIACC",
"author": "Alibaba Cloud",
"author_email": "ziqi.yzq@alibaba-inc.com",
"download_url": "https://files.pythonhosted.org/packages/75/57/04e15db45f4168bb9d27d97433ce462d39a008a59ed62b59ee0e31441f5d/deepnccl-1.0.6.tar.gz",
"platform": null,
"description": "# Deep-NCCL\n\nOptimized primitives for inter-GPU communication on Aliyun machines.\n\n## Introduction\n\nDeep-NCCL is an AI-Accelerator communication framework for NVIDIA-NCCL.\nIt implements optimized all-reduce, all-gather, reduce, broadcast, reduce-scatter, all-to-all, as well as any send/receive based communication pattern.\nIt has been optimized to achieve high bandwidth on aliyun machines using PCIe, NVLink, NVswitch, as well as networking using InfiniBand Verbs, eRDMA or TCP/IP sockets.\n\n## Install\n\nTo install Deep NCCL on the system, create a package then install it as root as follow two methods:\n\n- method1: rpm/deb (Recommended)\n```sh\n# Centos:\nwget http://mirrors.aliyun.com/aiacc/aiacc-nccl/aiacc_nccl-1.0.rpm\nrpm -i aiacc-nccl-1.0.rpm\n# Ubuntu:\nwget http://mirrors.aliyun.com/aiacc/aiacc-nccl/aiacc_nccl-1.0.deb\ndpkg -i aiacc-nccl-1.0.deb\n```\n- method2: python-offline\n```sh\nwget http://mirrors.aliyun.com/aiacc/aiacc-nccl/aiacc_nccl-2.0.0.tar.gz\npip install aiacc_nccl-2.0.0.tar.gz\n# notes: must download and then pip install, cannot merge in oneline `pip install aiacc_xxx_url` \n# Both method1 and method2 can run concurrently.\n```\n\n- method3: python-pypi\n```sh\npip install aiacc_nccl==2.0\n```\n\n## Usage\n\nAfter install aiacc-nccl package, you need do nothing to change code!\n\n\n## Environment\n\n* ***AIACC_FASTTUNING***: Enable Fasttuning for LLMs, default=1 is to enable.\n* ***NCCL_AIACC_ALLREDUCE_DISABLE***: Disable allreduce algo, default=0 is to enable.\n* ***NCCL_AIACC_ALLGATHER_DISABLE***: Disable allgather algo, default=0 is to enable.\n* ***NCCL_AIACC_REDUCE_SCATTER_DISABLE***: Disable reduce_scatter algo, default=0 is to enable.\n* ***AIACC_UPDATE_ALGO_DISABLE***: Disable update aiacc nccl algo from aiacc-sql-server, default=0 is to enable.\n\n## Performance\n\nDeep-NCCL can speedup the nccl performance on aliyun EGS(GPU machine), for example instance type 'ecs.ebmgn7ex.32xlarge' is A100 x 8 GPU and using network eRdma.\n\n| GPU(EGS) | Collective | Nodes | Network | Speedup(nccl-tests) |\n|-------------|----------------|---------|-----------|---------------------|\n| A100 x 8 | all_gather | 2-10 | VPC/eRdma | 30%+ |\n| A100 x 8 | reduce_scatter | 2-10 | VPC/eRdma | 30%+ |\n| A100 x 8 | all_reduce | 2-10 | VPC/eRdma | 20% |\n| V100 x 8 | all_reduce | 2-20 | VPC | 60%+ |\n| A10 x 8 | all_reduce | 1 | - | 20% |\n\n\n## Copyright\n\nAll source code and accompanying documentation is copyright (c) 2015-2020, NVIDIA CORPORATION. All rights reserved.\nAll modifications are copyright (c) 2020-2024, ALIYUN CORPORATION. All rights reserved.\n\n\n",
"bugtrack_url": null,
"license": "Copyright (C) Alibaba Group Holding Limited",
"summary": "AIACC-NCCL is an AI-Accelerator communication framework for NVIDIA-NCCL. It implements optimized all-reduce, all-gather, reduce, broadcast, reduce-scatter, all-to-all,as well as any send/receive based communication pattern.It has been optimized to achieve high bandwidth on aliyun machines using PCIe, NVLink, NVswitch,as well as networking using InfiniBand Verbs, eRDMA or TCP/IP sockets.",
"version": "1.0.6",
"project_urls": {
"Homepage": "https://help.aliyun.com/document_detail/462422.html?spm=a2c4g.462031.0.0.c5f96b4drcx52F"
},
"split_keywords": [
"distributed",
"deep learning",
"communication",
"nccl",
"aiacc"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "755704e15db45f4168bb9d27d97433ce462d39a008a59ed62b59ee0e31441f5d",
"md5": "f0d1e49992a3892c6f0a56f620d44b76",
"sha256": "e26433f6dc8f7ba23c3aa9d20018757da81d4c7ae7b557eb8ae194fe6cd31b97"
},
"downloads": -1,
"filename": "deepnccl-1.0.6.tar.gz",
"has_sig": false,
"md5_digest": "f0d1e49992a3892c6f0a56f620d44b76",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.0",
"size": 90918659,
"upload_time": "2023-11-09T07:59:07",
"upload_time_iso_8601": "2023-11-09T07:59:07.636425Z",
"url": "https://files.pythonhosted.org/packages/75/57/04e15db45f4168bb9d27d97433ce462d39a008a59ed62b59ee0e31441f5d/deepnccl-1.0.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-11-09 07:59:07",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "deepnccl"
}