Name | juxai2022 JSON |
Version |
1.0
JSON |
| download |
home_page | |
Summary | JUX is a jax-accelerated engine for Lux-2022. |
upload_time | 2022-12-26 08:33:26 |
maintainer | |
docs_url | None |
author | |
requires_python | <3.11,>=3.7 |
license | MIT License Copyright (c) 2022 Qimai Li, Yuhao Jiang. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. |
keywords |
jax
luxai2022
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# JUX
JUX is a <ins>J</ins>ax-accelerated game core for L<ins>ux</ins> AI Challenge Season 2, aimed to maximize game environment throughput for reinforcement learning (RL) training.
## Installation
### Install dependencies
One of the main dependencies is JAX, which in turn relies on NVCC, CUDA Toolkit and cuDNN. There are two ways to get them ready, either by conda or docker (recommended).
For conda users, you can install them with the following commands.
```console
$ conda install -c nvidia cuda-nvcc cuda-python
$ conda install cudnn
```
For docker users, you can use the [NVIDIA CUDA docker image](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda) or the [PyTorch docker image](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch), which has all of them ready and compatible with each other.
### Install JUX
First, you need to clone the repository.
```console
git clone https://github.com/RoboEden/jux.git
cd jux
```
Then, upgrade your pip and install JUX.
```console
$ pip install --upgrade pip
$ pip install ./jux -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
```
<!-- TODO: release to PyPI -->
For PyTorch users, you can install JUX with optional dependencies for PyTorch.
```console
$ pip install ./jux[torch] -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
```
## Usage
See [tutorial.ipynb](tutorial.ipynb) for a quick start. JUX is guaranteed to implement the same game logic as `luxai2022==1.1.4`, if players' input actions are valid. When players' input actions are invalid, JUX and LuxAI2022 may process them differently.
## Performance
JUX maps all game logic to array operators in JAX so that we can harvest the computational power of modern GPUs and support tons of environments running in parallel. We benchmarked JUX on several different GPUs, and increased the throughput by hundreds to thousands of times, compared with the original single-thread Python implementation.
LuxAI2022 is a game with a dynamic number of units, making it hard to be accelerated by JAX, because `jax.jit()` only supports arrays with static shapes. As a workaround, we allocate a large buffer with static length to store units. The buffer length (`buf_cfg.MAX_N_UNITS`) greatly affects the performance. Theoretically, no player can build more than 1500 units under current game configs, so `MAX_N_UNITS=1500` is a safe choice. However, we found that no player builds more than 200 units by watching game replays, so `MAX_N_UNITS=200` is a practical choice.
### Relative Throughput
Here, we report the relative throughput over the original Python implementation (`luxai2022==1.1.3`), on several different GPUs with different `MAX_N_UNITS` settings. The original single-thread Python implementation running on an 8255C CPU serves as the baseline. We can observe that the throughput is proportional to GPU memory bandwidth because the game logic is mostly memory-bound, not compute-bound. Byte-access operators take a large portion of the game logic in JUX implementation.
| Relative Throughput | | | | | | | | |
|:-------------------- | ------------------:| ----------:| ---------:| ---------:| ---------:| ---------:| ---------:| ----------:|
| GPU | GPU Mem. Bandwidth | Batch Size | UNITS=100 | UNITS=200 | UNITS=400 | UNITS=600 | UNITS=800 | UNITS=1000 |
| A100-SXM4-40GB | 1555 GB/s | 20k | 1166x | 985x | 748x | 598x | 508x | 437x |
| Tesla V100-SXM2-32GB | 900 GB/s | 20k | 783x | 647x | 480x | 375x | 317x | 269x |
| Tesla T4 | 320 GB/s | 10k | 263x | 217x | 160x | 125x | 105x | 89x |
| GTX 1660 Ti | 288 GB/s | 3k | 218x | 178x | 130x | 103x | 84x | 71x |
| | Batch Size | Relative Throughput |
|:----------------------------------------- | ----------:| -------------------:|
| Intel® Core™ i7-12700 | 1 | 2.12x |
| Intel® Xeon® Platinum 8255C CPU @ 2.50GHz | 1 | 1.00x |
| Intel® Xeon® Gold 6133 @ 2.50GHz | 1 | 0.89x |
### Absolute Throughput
We also report absolute throughput in steps per second here.
| Throughput (steps/s) | | | | | | | |
|:-------------------- | ----------:| ---------:| ---------:| ---------:| ---------:| ---------:| ----------:|
| GPU | Batch Size | UNITS=100 | UNITS=200 | UNITS=400 | UNITS=600 | UNITS=800 | UNITS=1000 |
| A100-SXM4-40GB | 20k | 363k | 307k | 233k | 186k | 158k | 136k |
| Tesla V100-SXM2-32GB | 20k | 244k | 202k | 149k | 117k | 99k | 84k |
| Tesla T4 | 10k | 82k | 68k | 50k | 39k | 33k | 28k |
| GTX 1660 Ti | 3k | 68k | 56k | 40k | 32k | 26k | 22k |
| | Batch Size | Throughput (steps/s) |
|:----------------------------------------- | ----------:| --------------------:|
| Intel® Core™ i7-12700 | 1 | 0.66k |
| Intel® Xeon® Platinum 8255C CPU @ 2.50GHz | 1 | 0.31k |
| Intel® Xeon® Gold 6133 @ 2.50GHz | 1 | 0.28k |
## Contributing
If you find any bugs or have any suggestions, please feel free to open an issue or a pull request. See [CONTRIBUTING.md](CONTRIBUTING.md) for setting up a developing environment.
Raw data
{
"_id": null,
"home_page": "",
"name": "juxai2022",
"maintainer": "",
"docs_url": null,
"requires_python": "<3.11,>=3.7",
"maintainer_email": "",
"keywords": "JAX,LuxAI2022",
"author": "",
"author_email": "Qimai Li <qimaili@chaocanshu.ai>, Yuhao Jiang <yuhaojiang@chaocanshu.ai>",
"download_url": "https://files.pythonhosted.org/packages/2d/1f/ad1cf1e16f346b7fb894d91d44f16d87ac5d56df36e2215d1fd043ec1ab9/juxai2022-1.0.tar.gz",
"platform": null,
"description": "# JUX\nJUX is a <ins>J</ins>ax-accelerated game core for L<ins>ux</ins> AI Challenge Season 2, aimed to maximize game environment throughput for reinforcement learning (RL) training.\n\n## Installation\n### Install dependencies\nOne of the main dependencies is JAX, which in turn relies on NVCC, CUDA Toolkit and cuDNN. There are two ways to get them ready, either by conda or docker (recommended).\n\nFor conda users, you can install them with the following commands.\n```console\n$ conda install -c nvidia cuda-nvcc cuda-python\n$ conda install cudnn\n```\nFor docker users, you can use the [NVIDIA CUDA docker image](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda) or the [PyTorch docker image](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch), which has all of them ready and compatible with each other.\n\n### Install JUX\nFirst, you need to clone the repository.\n```console\ngit clone https://github.com/RoboEden/jux.git\ncd jux\n```\nThen, upgrade your pip and install JUX.\n```console\n$ pip install --upgrade pip\n$ pip install ./jux -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html\n```\n<!-- TODO: release to PyPI -->\nFor PyTorch users, you can install JUX with optional dependencies for PyTorch.\n```console\n$ pip install ./jux[torch] -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html\n```\n\n## Usage\n See [tutorial.ipynb](tutorial.ipynb) for a quick start. JUX is guaranteed to implement the same game logic as `luxai2022==1.1.4`, if players' input actions are valid. When players' input actions are invalid, JUX and LuxAI2022 may process them differently.\n\n## Performance\nJUX maps all game logic to array operators in JAX so that we can harvest the computational power of modern GPUs and support tons of environments running in parallel. We benchmarked JUX on several different GPUs, and increased the throughput by hundreds to thousands of times, compared with the original single-thread Python implementation.\n\nLuxAI2022 is a game with a dynamic number of units, making it hard to be accelerated by JAX, because `jax.jit()` only supports arrays with static shapes. As a workaround, we allocate a large buffer with static length to store units. The buffer length (`buf_cfg.MAX_N_UNITS`) greatly affects the performance. Theoretically, no player can build more than 1500 units under current game configs, so `MAX_N_UNITS=1500` is a safe choice. However, we found that no player builds more than 200 units by watching game replays, so `MAX_N_UNITS=200` is a practical choice.\n\n### Relative Throughput\nHere, we report the relative throughput over the original Python implementation (`luxai2022==1.1.3`), on several different GPUs with different `MAX_N_UNITS` settings. The original single-thread Python implementation running on an 8255C CPU serves as the baseline. We can observe that the throughput is proportional to GPU memory bandwidth because the game logic is mostly memory-bound, not compute-bound. Byte-access operators take a large portion of the game logic in JUX implementation.\n\n| Relative Throughput | | | | | | | | |\n|:-------------------- | ------------------:| ----------:| ---------:| ---------:| ---------:| ---------:| ---------:| ----------:|\n| GPU | GPU Mem. Bandwidth | Batch Size | UNITS=100 | UNITS=200 | UNITS=400 | UNITS=600 | UNITS=800 | UNITS=1000 |\n| A100-SXM4-40GB | 1555 GB/s | 20k | 1166x | 985x | 748x | 598x | 508x | 437x |\n| Tesla V100-SXM2-32GB | 900 GB/s | 20k | 783x | 647x | 480x | 375x | 317x | 269x |\n| Tesla T4 | 320 GB/s | 10k | 263x | 217x | 160x | 125x | 105x | 89x |\n| GTX 1660 Ti | 288 GB/s | 3k | 218x | 178x | 130x | 103x | 84x | 71x |\n\n\n| | Batch Size | Relative Throughput |\n|:----------------------------------------- | ----------:| -------------------:|\n| Intel\u00ae Core\u2122 i7-12700\t \t | 1 | 2.12x |\n| Intel\u00ae Xeon\u00ae Platinum 8255C CPU @ 2.50GHz\t| 1 | 1.00x |\n| Intel\u00ae Xeon\u00ae Gold 6133 @ 2.50GHz\t | 1 | 0.89x |\n\n\n\n### Absolute Throughput\nWe also report absolute throughput in steps per second here.\n| Throughput (steps/s) | | | | | | | |\n|:-------------------- | ----------:| ---------:| ---------:| ---------:| ---------:| ---------:| ----------:|\n| GPU | Batch Size | UNITS=100 | UNITS=200 | UNITS=400 | UNITS=600 | UNITS=800 | UNITS=1000 |\n| A100-SXM4-40GB | 20k | 363k | 307k | 233k | 186k | 158k | 136k |\n| Tesla V100-SXM2-32GB | 20k | 244k | 202k | 149k | 117k | 99k | 84k |\n| Tesla T4 | 10k | 82k | 68k | 50k | 39k | 33k | 28k |\n| GTX 1660 Ti | 3k | 68k | 56k | 40k | 32k | 26k | 22k |\n\n| | Batch Size | Throughput (steps/s) |\n|:----------------------------------------- | ----------:| --------------------:|\n| Intel\u00ae Core\u2122 i7-12700\t \t | 1 | 0.66k |\n| Intel\u00ae Xeon\u00ae Platinum 8255C CPU @ 2.50GHz\t| 1 | 0.31k |\n| Intel\u00ae Xeon\u00ae Gold 6133 @ 2.50GHz\t | 1 | 0.28k |\n\n## Contributing\nIf you find any bugs or have any suggestions, please feel free to open an issue or a pull request. See [CONTRIBUTING.md](CONTRIBUTING.md) for setting up a developing environment.\n",
"bugtrack_url": null,
"license": "MIT License Copyright (c) 2022 Qimai Li, Yuhao Jiang. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
"summary": "JUX is a jax-accelerated engine for Lux-2022.",
"version": "1.0",
"split_keywords": [
"jax",
"luxai2022"
],
"urls": [
{
"comment_text": "",
"digests": {
"md5": "e93ba23ade0363f1ef8a15774f281d13",
"sha256": "0e8d58251c04fe1560fa668a3506172fc892d8889ed8b373082ec08e732bfa72"
},
"downloads": -1,
"filename": "juxai2022-1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e93ba23ade0363f1ef8a15774f281d13",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.11,>=3.7",
"size": 57344,
"upload_time": "2022-12-26T08:33:23",
"upload_time_iso_8601": "2022-12-26T08:33:23.088826Z",
"url": "https://files.pythonhosted.org/packages/cb/9d/70e27e6320775d6568ba539e05597a4c51572290063204a65814b9b10bbd/juxai2022-1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "fcc372ccb57ad75644ba00f0ea3d3b36",
"sha256": "03fd266a04d172071d7fa2103d90a26d86319c07cf018bcda48634105ed5cfac"
},
"downloads": -1,
"filename": "juxai2022-1.0.tar.gz",
"has_sig": false,
"md5_digest": "fcc372ccb57ad75644ba00f0ea3d3b36",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.11,>=3.7",
"size": 1037723,
"upload_time": "2022-12-26T08:33:26",
"upload_time_iso_8601": "2022-12-26T08:33:26.321593Z",
"url": "https://files.pythonhosted.org/packages/2d/1f/ad1cf1e16f346b7fb894d91d44f16d87ac5d56df36e2215d1fd043ec1ab9/juxai2022-1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2022-12-26 08:33:26",
"github": false,
"gitlab": false,
"bitbucket": false,
"lcname": "juxai2022"
}