hsuanwu


Namehsuanwu JSON
Version 0.0.1b7 PyPI version JSON
download
home_page
SummaryLong-Term Evolution Project of Reinforcement Learning
upload_time2023-05-15 05:05:59
maintainer
docs_urlNone
author
requires_python>=3.7
license
keywords algorithm baseline evolution reinforcement learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align=center>
<img src='./docs/assets/images/logo.png' style="width: 70%">
</div>

|<img src="https://img.shields.io/badge/License-MIT-%230677b8"> <img src="https://img.shields.io/badge/GPU-NVIDIA-%2377b900"> <img src="https://img.shields.io/badge/NPU-Ascend-%23c31d20"> <img src="https://img.shields.io/badge/Python-%3E%3D3.8-%2335709F"> <img src="https://img.shields.io/badge/Docs-Passing-%23009485"> <img src="https://img.shields.io/badge/Codestyle-Black-black"> <img src="https://img.shields.io/badge/PyPI%20Package-0.0.1-%23006DAD"> <img src="https://img.shields.io/badge/πŸ€—Benchmark-HuggingFace-%23FFD21E"> <img src="https://img.shields.io/badge/Pytorch-%3E%3D2.0.0-%23EF5739"> <img src="https://img.shields.io/badge/Hydra-1.3.2-%23E88444"> <img src="https://img.shields.io/badge/Gymnasium-%3E%3D0.28.1-brightgreen"> <img src="https://img.shields.io/badge/DMC Suite-1.0.11-blue"> <img src="https://img.shields.io/badge/Procgen-0.10.7-blueviolet"> <img src="https://img.shields.io/badge/2.2.1-MiniGrid-%23c8c8c8"> <img src="https://img.shields.io/badge/PyBullet-3.2.5-%236A94D4">|
|:-:|

**Hsuanwu: Long-Term Evolution Project of Reinforcement Learning** is inspired by the long-term evolution (LTE) standard project in telecommunications, which aims to track the latest research progress in reinforcement learning (RL) and provide stable and efficient baselines. In Hsuanwu, you can find everything you need in RL, such as training, evaluation, deployment, etc. The highlight features of Hsuanwu:

- ⏱️ Latest algorithms and tricks;
- 🧱 Highly modularized design for complete decoupling of RL algorithms;
- πŸš€ Optimized workflow for full hardware acceleration;
- βš™οΈ Support for custom environments;
- πŸ–₯️ Support for multiple computing devices like GPU and NPU;
- πŸ› οΈ Support for RL model engineering deployment (TensorRT, CANN, ...);
- πŸ’Ύ Large number of reusable bechmarks ([See HsuanwuHub](hub.hsuanwu.dev));
- πŸ“‹ Elegant experimental management powered by [Hydra](https://hydra.cc/).

Hsuanwu ([Xuanwu, ηŽ„ζ­¦](https://en.wikipedia.org/wiki/Xuanwu_(god))) is one of the Four Symbols of the Chinese constellations, representing the north and the winter season. It is usually depicted as a turtle entwined together with a snake. Since turtles are very long-lived, we use this name to symbolize the long-term and influential development of the project.

Join the developer community for issues and discussions:
|Slack|QQ|GitHub|
|:-:|:-:|:-:|
|<a href="https://app.slack.com/client/T054J4NJXP0/C054T78QZ9A"><img src='./docs/assets/images/slack.png' style="width: 50%" ></a>|<img src='./docs/assets/images/qq.jpg' style="width: 65%">|<a href="https://github.com/RLE-Foundation/Hsuanwu/issues"><img src='./docs/assets/images/github_issues.png' style="width: 50%"></a>|



<!-- Please cite the following paper if you use Hsuanwu in your work, thank you!
```bibtex
@article{yuan2023hsuanwu,
  title={Hsuanwu: Long-Term Evolution Project of Reinforcement Learning},
  author={Yuan, Mingqi and Luo, Shihao and Zhang, Zequn and Yang, Xu and Jin, Xin and Li, Bo and Zeng, Wenjun},
  journal={arXiv preprint arXiv:2311.15277},
  year={2023}
}
``` -->

- [Quick Start](#quick-start)
  - [Installation](#installation)
  - [Build your first Hsuanwu application](#build-your-first-hsuanwu-application)
    - [On NVIDIA GPU](#on-nvidia-gpu)
    - [On HUAWEI NPU](#on-huawei-npu)
- [Implemented Modules](#implemented-modules)
  - [Roadmap](#roadmap)
  - [Project Structure](#project-structure)
  - [RL Agents](#rl-agents)
  - [Intrinsic Reward Modules](#intrinsic-reward-modules)
- [Model Zoo](#model-zoo)
- [API Documentation](#api-documentation)
- [How To Contribute](#how-to-contribute)
- [Acknowledgment](#acknowledgment)

# Quick Start
## Installation
- Prerequisites

Currently, Hsuanwu recommends `Python>=3.8`, user can create an virtual environment by
``` sh
conda create -n hsuanwu python=3.8
```

- with pip `recommended`

Open up a terminal and install **Hsuanwu** with `pip`:
``` shell
pip install hsuanwu # basic installation
pip install hsuanwu[envs] # for pre-defined environments
pip install hsuanwu[tests] # for project tests
pip install hsuanwu[all] # install all the dependencies
```

- with git

Open up a terminal and clone the repository from [GitHub](https://github.com/RLE-Foundation/Hsuanwu) with `git`:
``` sh
git clone https://github.com/RLE-Foundation/Hsuanwu.git
```
After that, run the following command to install package and dependencies:
``` sh
pip install -e . # basic installation
pip install -e .[envs] # for pre-defined environments
pip install -e .[tests] # for project tests
pip install -e .[all] # install all the dependencies
```

For more detailed installation instruction, see [https://docs.hsuanwu.dev/getting_started](https://docs.hsuanwu.dev/getting_started).

## Build your first Hsuanwu application
### On NVIDIA GPU
For example, we want to use [DrQ-v2](https://openreview.net/forum?id=_SJ-_yyes8) to solve a task of [DeepMind Control Suite](https://github.com/deepmind/dm_control), and we only need the following two steps:

1. Write a `config.yaml` file in your working directory like:
``` yaml
experiment: drqv2_dmc     # Experiment ID.
device: cuda:0            # Device (cpu, cuda, ...) on which the code should be run.
seed: 1                   # Random seed for reproduction.
num_train_steps: 250000   # Number of training steps.

agent:
  name: DrQv2             # The agent name.
```

2. Write a `train.py` file like:
``` python
import hydra # Use Hydra to manage experiments

from hsuanwu.env import make_dmc_env # Import DeepMind Control Suite
from hsuanwu.common.engine import HsuanwuEngine # Import Hsuanwu engine

train_env = make_dmc_env(env_id='cartpole_balance') # Create train env
test_env = make_dmc_env(env_id='cartpole_balance') # Create test env

@hydra.main(version_base=None, config_path='./', config_name='config')
def main(cfgs):
    engine = HsuanwuEngine(cfgs=cfgs, train_env=train_env, test_env=test_env) # Initialize engine
    engine.invoke() # Start training

if __name__ == '__main__':
    main()
```
Run `train.py` and you will see the following output:

<div align=center>
<img src='./docs/assets/images/rl_training_gpu.png'>
</div>

Alternatively, you can use `HsuanwuHub` to realize fast training, in which we preset a large number of RL applications. Install `HsuanwuHub` 
with `pip`:
``` sh
pip install hsuanwuhub
```
Then run the following command to perform training directly:
``` sh
python -m hsuanwuhub.train \
    task=drqv2_dmc_pixel \
    device=cuda:0 \
    num_train_steps=50000
```

### On HUAWEI NPU
Similarly, if we want to train an agent on HUAWEI NPU, it suffices to override the training command like:
``` sh
python train.py device=npu:0
```
Then you will see the following output:
<div align=center>
<img src='./docs/assets/images/rl_training_npu.png'>
</div>

> Please refer to [Implemented Modules](#implemented-modules) for the compatibility of NPU.

For more detailed tutorials, see [https://docs.hsuanwu.dev/tutorials](https://docs.hsuanwu.dev/tutorials).

# Implemented Modules
## Roadmap
Hsuanwu evolves based on reinforcement learning algorithms and integrates latest tricks. The following figure demonstrates the main evolution roadmap of Hsuanwu:

<div align=center>
<img src='./docs/assets/images/roadmap.svg'  style="width: 90%">
</div>

## Project Structure

See the project structure below:
<div align=center>
<img src='./docs/assets/images/structure.svg' style="width: 90%">
</div>

- **[Common](https://docs.hsuanwu.dev/common_index/)**: Auxiliary modules like trainer and logger.
    + **Engine**: *Engine for building Hsuanwu application.*
    + **Logger**: *Logger for managing output information.*

- **[Xploit](https://docs.hsuanwu.dev/xploit_index/)**: Modules that focus on <font color="#B80000"><b>exploitation</b></font> in RL.
    + **Encoder**: *Neural nework-based encoder for processing observations.*
    + **Agent**: *Agent for interacting and learning.*
    + **Storage**: *Storage for storing collected experiences.*

- **[Xplore](https://docs.hsuanwu.dev/xplore_index/)**: Modules that focus on <font color="#B80000"><b>exploration</b></font> in RL.
    + **Augmentation**: *PyTorch.nn-like modules for observation augmentation.*
    + **Distribution**: *Distributions for sampling actions.*
    + **Reward**: *Intrinsic reward modules for enhancing exploration.*

- **[Evaluation](https://docs.hsuanwu.dev/evaluation_index/)**: Reasonable and reliable metrics for algorithm evaluation.

- **[Env](https://docs.hsuanwu.dev/env_index/)**: Packaged environments (e.g., Atari games) for fast invocation.

- **[Pre-training](https://docs.hsuanwu.dev/pretraining_index/)**: Methods of <font color="#B80000"><b>pre-training</b></font> in RL.

- **[Deployment](https://docs.hsuanwu.dev/deployment_index/)**: Methods of <font color="#B80000"><b>model deployment</b></font> in RL.

For more detiled descriptions of these modules, see [https://docs.hsuanwu.dev/api](https://docs.hsuanwu.dev/api)

## RL Agents
|Module|Recurrent|Box|Discrete|MultiBinary|Multi Processing|NPU|Paper|Citations|
|:-|:-|:-|:-|:-|:-|:-|:-|:-|
|SAC|❌| βœ”οΈ |❌|❌|❌|🐌 | [Link](http://proceedings.mlr.press/v80/haarnoja18b/haarnoja18b.pdf) |5077⭐|
|DrQ|❌| βœ”οΈ |❌|❌|❌|🐌 | [Link](https://arxiv.org/pdf/2004.13649) |433⭐|
|DDPG|❌| βœ”οΈ |❌|❌|❌|βœ”οΈ | [Link](https://arxiv.org/pdf/1509.02971.pdf?source=post_page---------------------------) |11819⭐|
|DrQ-v2|❌| βœ”οΈ |❌|❌|❌|βœ”οΈ | [Link](https://arxiv.org/pdf/2107.09645.pdf?utm_source=morioh.com) |100⭐|
|PPO|❌| βœ”οΈ |βœ”οΈ|βœ”οΈ|βœ”οΈ|βœ”οΈ | [Link](https://arxiv.org/pdf/1707.06347) |11155⭐|
|DrAC|❌| βœ”οΈ |βœ”οΈ|βœ”οΈ|βœ”οΈ|βœ”οΈ | [Link](https://proceedings.neurips.cc/paper/2021/file/2b38c2df6a49b97f706ec9148ce48d86-Paper.pdf) |29⭐|
|DAAC|❌| βœ”οΈ |βœ”οΈ|βœ”οΈ|βœ”οΈ|🐌 | [Link](http://proceedings.mlr.press/v139/raileanu21a/raileanu21a.pdf) |56⭐|
|PPG|❌| βœ”οΈ |βœ”οΈ|❌|βœ”οΈ|🐌| [Link](http://proceedings.mlr.press/v139/cobbe21a/cobbe21a.pdf) |82⭐|
|IMPALA|βœ”οΈ| βœ”οΈ |βœ”οΈ|❌|βœ”οΈ|🐌| [Link](http://proceedings.mlr.press/v80/espeholt18a/espeholt18a.pdf) |1219⭐|

> - 🐌: Developing.
> - `NPU`: Support Neural-network processing unit.
> - `Recurrent`: Support recurrent neural network.
> - `Box`: A N-dimensional box that containes every point in the action space.
> - `Discrete`: A list of possible actions, where each timestep only one of the actions can be used.
> - `MultiBinary`: A list of possible actions, where each timestep any of the actions can be used in any combination.

## Intrinsic Reward Modules
| Module | Remark | Repr.  | Visual | Reference | 
|:-|:-|:-|:-|:-|
| PseudoCounts | Count-Based exploration |βœ”οΈ|βœ”οΈ|[Never Give Up: Learning Directed Exploration Strategies](https://arxiv.org/pdf/2002.06038) |
| ICM  | Curiosity-driven exploration  | βœ”οΈ|βœ”οΈ| [Curiosity-Driven Exploration by Self-Supervised Prediction](http://proceedings.mlr.press/v70/pathak17a/pathak17a.pdf) | 
| RND  | Count-based exploration  | ❌|βœ”οΈ| [Exploration by Random Network Distillation](https://arxiv.org/pdf/1810.12894.pdf) | 
| GIRM | Curiosity-driven exploration  | βœ”οΈ |βœ”οΈ| [Intrinsic Reward Driven Imitation Learning via Generative Model](http://proceedings.mlr.press/v119/yu20d/yu20d.pdf)|
| NGU | Memory-based exploration  | βœ”οΈ  |βœ”οΈ| [Never Give Up: Learning Directed Exploration Strategies](https://arxiv.org/pdf/2002.06038) | 
| RIDE| Procedurally-generated environment | βœ”οΈ |βœ”οΈ| [RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments](https://arxiv.org/pdf/2002.12292)|
| RE3  | Entropy Maximization | ❌ |βœ”οΈ| [State Entropy Maximization with Random Encoders for Efficient Exploration](http://proceedings.mlr.press/v139/seo21a/seo21a.pdf) |
| RISE  | Entropy Maximization  | ❌  |βœ”οΈ| [RΓ©nyi State Entropy Maximization for Exploration Acceleration in Reinforcement Learning](https://ieeexplore.ieee.org/abstract/document/9802917/) | 
| REVD  | Divergence Maximization | ❌  |βœ”οΈ| [Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning](https://openreview.net/pdf?id=V2pw1VYMrDo)|

> - 🐌: Developing.
> - `Repr.`: The method involves representation learning.
> - `Visual`: The method works well in visual RL.

See [Tutorials: Use intrinsic reward and observation augmentation](https://docs.hsuanwu.dev/tutorials/data_augmentation.md) for usage examples.

# Model Zoo
Hsuanwu provides a large number of reusable bechmarks, see [https://hub.hsuanwu.dev/](https://hub.hsuanwu.dev/) and [https://docs.hsuanwu.dev/benchmarks/](https://docs.hsuanwu.dev/benchmarks/)

# API Documentation
View our well-designed documentation: [https://docs.hsuanwu.dev/](https://docs.hsuanwu.dev/)

# How To Contribute
Welcome to contribute to this project! Before you begin writing code, please read [CONTRIBUTING.md](https://github.com/RLE-Foundation/Hsuanwu/blob/main/CONTRIBUTING.md) for guide first.

# Acknowledgment
This project is supported by [FUNDING.yml](https://github.com/RLE-Foundation/Hsuanwu/blob/main/.github/FUNDING.yml). Some code of this project is borrowed or inspired by several excellent projects, and we highly appreciate them. See [ACKNOWLEDGMENT.md](https://github.com/RLE-Foundation/Hsuanwu/blob/main/ACKNOWLEDGMENT.md).

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "hsuanwu",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "Algorithm,Baseline,Evolution,Reinforcement Learning",
    "author": "",
    "author_email": "Reinforcement Learning Evolution Foundation <hsuanwudev@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/2b/55/5cd6c2ebe250a927af0ebde96ffe201d8368911fca74d9c197fb98e07981/hsuanwu-0.0.1b7.tar.gz",
    "platform": null,
    "description": "<div align=center>\n<img src='./docs/assets/images/logo.png' style=\"width: 70%\">\n</div>\n\n|<img src=\"https://img.shields.io/badge/License-MIT-%230677b8\"> <img src=\"https://img.shields.io/badge/GPU-NVIDIA-%2377b900\"> <img src=\"https://img.shields.io/badge/NPU-Ascend-%23c31d20\"> <img src=\"https://img.shields.io/badge/Python-%3E%3D3.8-%2335709F\"> <img src=\"https://img.shields.io/badge/Docs-Passing-%23009485\"> <img src=\"https://img.shields.io/badge/Codestyle-Black-black\"> <img src=\"https://img.shields.io/badge/PyPI%20Package-0.0.1-%23006DAD\"> <img src=\"https://img.shields.io/badge/\ud83e\udd17Benchmark-HuggingFace-%23FFD21E\"> <img src=\"https://img.shields.io/badge/Pytorch-%3E%3D2.0.0-%23EF5739\"> <img src=\"https://img.shields.io/badge/Hydra-1.3.2-%23E88444\"> <img src=\"https://img.shields.io/badge/Gymnasium-%3E%3D0.28.1-brightgreen\"> <img src=\"https://img.shields.io/badge/DMC Suite-1.0.11-blue\"> <img src=\"https://img.shields.io/badge/Procgen-0.10.7-blueviolet\"> <img src=\"https://img.shields.io/badge/2.2.1-MiniGrid-%23c8c8c8\"> <img src=\"https://img.shields.io/badge/PyBullet-3.2.5-%236A94D4\">|\n|:-:|\n\n**Hsuanwu: Long-Term Evolution Project of Reinforcement Learning** is inspired by the long-term evolution (LTE) standard project in telecommunications, which aims to track the latest research progress in reinforcement learning (RL) and provide stable and efficient baselines. In Hsuanwu, you can find everything you need in RL, such as training, evaluation, deployment, etc. The highlight features of Hsuanwu:\n\n- \u23f1\ufe0f Latest algorithms and tricks;\n- \ud83e\uddf1 Highly modularized design for complete decoupling of RL algorithms;\n- \ud83d\ude80 Optimized workflow for full hardware acceleration;\n- \u2699\ufe0f Support for custom environments;\n- \ud83d\udda5\ufe0f Support for multiple computing devices like GPU and NPU;\n- \ud83d\udee0\ufe0f Support for RL model engineering deployment (TensorRT, CANN, ...);\n- \ud83d\udcbe Large number of reusable bechmarks ([See HsuanwuHub](hub.hsuanwu.dev));\n- \ud83d\udccb Elegant experimental management powered by [Hydra](https://hydra.cc/).\n\nHsuanwu ([Xuanwu, \u7384\u6b66](https://en.wikipedia.org/wiki/Xuanwu_(god))) is one of the Four Symbols of the Chinese constellations, representing the north and the winter season. It is usually depicted as a turtle entwined together with a snake. Since turtles are very long-lived, we use this name to symbolize the long-term and influential development of the project.\n\nJoin the developer community for issues and discussions:\n|Slack|QQ|GitHub|\n|:-:|:-:|:-:|\n|<a href=\"https://app.slack.com/client/T054J4NJXP0/C054T78QZ9A\"><img src='./docs/assets/images/slack.png' style=\"width: 50%\" ></a>|<img src='./docs/assets/images/qq.jpg' style=\"width: 65%\">|<a href=\"https://github.com/RLE-Foundation/Hsuanwu/issues\"><img src='./docs/assets/images/github_issues.png' style=\"width: 50%\"></a>|\n\n\n\n<!-- Please cite the following paper if you use Hsuanwu in your work, thank you!\n```bibtex\n@article{yuan2023hsuanwu,\n  title={Hsuanwu: Long-Term Evolution Project of Reinforcement Learning},\n  author={Yuan, Mingqi and Luo, Shihao and Zhang, Zequn and Yang, Xu and Jin, Xin and Li, Bo and Zeng, Wenjun},\n  journal={arXiv preprint arXiv:2311.15277},\n  year={2023}\n}\n``` -->\n\n- [Quick Start](#quick-start)\n  - [Installation](#installation)\n  - [Build your first Hsuanwu application](#build-your-first-hsuanwu-application)\n    - [On NVIDIA GPU](#on-nvidia-gpu)\n    - [On HUAWEI NPU](#on-huawei-npu)\n- [Implemented Modules](#implemented-modules)\n  - [Roadmap](#roadmap)\n  - [Project Structure](#project-structure)\n  - [RL Agents](#rl-agents)\n  - [Intrinsic Reward Modules](#intrinsic-reward-modules)\n- [Model Zoo](#model-zoo)\n- [API Documentation](#api-documentation)\n- [How To Contribute](#how-to-contribute)\n- [Acknowledgment](#acknowledgment)\n\n# Quick Start\n## Installation\n- Prerequisites\n\nCurrently, Hsuanwu recommends `Python>=3.8`, user can create an virtual environment by\n``` sh\nconda create -n hsuanwu python=3.8\n```\n\n- with pip `recommended`\n\nOpen up a terminal and install **Hsuanwu** with `pip`:\n``` shell\npip install hsuanwu # basic installation\npip install hsuanwu[envs] # for pre-defined environments\npip install hsuanwu[tests] # for project tests\npip install hsuanwu[all] # install all the dependencies\n```\n\n- with git\n\nOpen up a terminal and clone the repository from [GitHub](https://github.com/RLE-Foundation/Hsuanwu) with `git`:\n``` sh\ngit clone https://github.com/RLE-Foundation/Hsuanwu.git\n```\nAfter that, run the following command to install package and dependencies:\n``` sh\npip install -e . # basic installation\npip install -e .[envs] # for pre-defined environments\npip install -e .[tests] # for project tests\npip install -e .[all] # install all the dependencies\n```\n\nFor more detailed installation instruction, see [https://docs.hsuanwu.dev/getting_started](https://docs.hsuanwu.dev/getting_started).\n\n## Build your first Hsuanwu application\n### On NVIDIA GPU\nFor example, we want to use [DrQ-v2](https://openreview.net/forum?id=_SJ-_yyes8) to solve a task of [DeepMind Control Suite](https://github.com/deepmind/dm_control), and we only need the following two steps:\n\n1. Write a `config.yaml` file in your working directory like:\n``` yaml\nexperiment: drqv2_dmc     # Experiment ID.\ndevice: cuda:0            # Device (cpu, cuda, ...) on which the code should be run.\nseed: 1                   # Random seed for reproduction.\nnum_train_steps: 250000   # Number of training steps.\n\nagent:\n  name: DrQv2             # The agent name.\n```\n\n2. Write a `train.py` file like:\n``` python\nimport hydra # Use Hydra to manage experiments\n\nfrom hsuanwu.env import make_dmc_env # Import DeepMind Control Suite\nfrom hsuanwu.common.engine import HsuanwuEngine # Import Hsuanwu engine\n\ntrain_env = make_dmc_env(env_id='cartpole_balance') # Create train env\ntest_env = make_dmc_env(env_id='cartpole_balance') # Create test env\n\n@hydra.main(version_base=None, config_path='./', config_name='config')\ndef main(cfgs):\n    engine = HsuanwuEngine(cfgs=cfgs, train_env=train_env, test_env=test_env) # Initialize engine\n    engine.invoke() # Start training\n\nif __name__ == '__main__':\n    main()\n```\nRun `train.py` and you will see the following output:\n\n<div align=center>\n<img src='./docs/assets/images/rl_training_gpu.png'>\n</div>\n\nAlternatively, you can use `HsuanwuHub` to realize fast training, in which we preset a large number of RL applications. Install `HsuanwuHub` \nwith `pip`:\n``` sh\npip install hsuanwuhub\n```\nThen run the following command to perform training directly:\n``` sh\npython -m hsuanwuhub.train \\\n    task=drqv2_dmc_pixel \\\n    device=cuda:0 \\\n    num_train_steps=50000\n```\n\n### On HUAWEI NPU\nSimilarly, if we want to train an agent on HUAWEI NPU, it suffices to override the training command like:\n``` sh\npython train.py device=npu:0\n```\nThen you will see the following output:\n<div align=center>\n<img src='./docs/assets/images/rl_training_npu.png'>\n</div>\n\n> Please refer to [Implemented Modules](#implemented-modules) for the compatibility of NPU.\n\nFor more detailed tutorials, see [https://docs.hsuanwu.dev/tutorials](https://docs.hsuanwu.dev/tutorials).\n\n# Implemented Modules\n## Roadmap\nHsuanwu evolves based on reinforcement learning algorithms and integrates latest tricks. The following figure demonstrates the main evolution roadmap of Hsuanwu:\n\n<div align=center>\n<img src='./docs/assets/images/roadmap.svg'  style=\"width: 90%\">\n</div>\n\n## Project Structure\n\nSee the project structure below:\n<div align=center>\n<img src='./docs/assets/images/structure.svg' style=\"width: 90%\">\n</div>\n\n- **[Common](https://docs.hsuanwu.dev/common_index/)**: Auxiliary modules like trainer and logger.\n    + **Engine**: *Engine for building Hsuanwu application.*\n    + **Logger**: *Logger for managing output information.*\n\n- **[Xploit](https://docs.hsuanwu.dev/xploit_index/)**: Modules that focus on <font color=\"#B80000\"><b>exploitation</b></font> in RL.\n    + **Encoder**: *Neural nework-based encoder for processing observations.*\n    + **Agent**: *Agent for interacting and learning.*\n    + **Storage**: *Storage for storing collected experiences.*\n\n- **[Xplore](https://docs.hsuanwu.dev/xplore_index/)**: Modules that focus on <font color=\"#B80000\"><b>exploration</b></font> in RL.\n    + **Augmentation**: *PyTorch.nn-like modules for observation augmentation.*\n    + **Distribution**: *Distributions for sampling actions.*\n    + **Reward**: *Intrinsic reward modules for enhancing exploration.*\n\n- **[Evaluation](https://docs.hsuanwu.dev/evaluation_index/)**: Reasonable and reliable metrics for algorithm evaluation.\n\n- **[Env](https://docs.hsuanwu.dev/env_index/)**: Packaged environments (e.g., Atari games) for fast invocation.\n\n- **[Pre-training](https://docs.hsuanwu.dev/pretraining_index/)**: Methods of <font color=\"#B80000\"><b>pre-training</b></font> in RL.\n\n- **[Deployment](https://docs.hsuanwu.dev/deployment_index/)**: Methods of <font color=\"#B80000\"><b>model deployment</b></font> in RL.\n\nFor more detiled descriptions of these modules, see [https://docs.hsuanwu.dev/api](https://docs.hsuanwu.dev/api)\n\n## RL Agents\n|Module|Recurrent|Box|Discrete|MultiBinary|Multi Processing|NPU|Paper|Citations|\n|:-|:-|:-|:-|:-|:-|:-|:-|:-|\n|SAC|\u274c| \u2714\ufe0f |\u274c|\u274c|\u274c|\ud83d\udc0c | [Link](http://proceedings.mlr.press/v80/haarnoja18b/haarnoja18b.pdf) |5077\u2b50|\n|DrQ|\u274c| \u2714\ufe0f |\u274c|\u274c|\u274c|\ud83d\udc0c | [Link](https://arxiv.org/pdf/2004.13649) |433\u2b50|\n|DDPG|\u274c| \u2714\ufe0f |\u274c|\u274c|\u274c|\u2714\ufe0f | [Link](https://arxiv.org/pdf/1509.02971.pdf?source=post_page---------------------------) |11819\u2b50|\n|DrQ-v2|\u274c| \u2714\ufe0f |\u274c|\u274c|\u274c|\u2714\ufe0f | [Link](https://arxiv.org/pdf/2107.09645.pdf?utm_source=morioh.com) |100\u2b50|\n|PPO|\u274c| \u2714\ufe0f |\u2714\ufe0f|\u2714\ufe0f|\u2714\ufe0f|\u2714\ufe0f | [Link](https://arxiv.org/pdf/1707.06347) |11155\u2b50|\n|DrAC|\u274c| \u2714\ufe0f |\u2714\ufe0f|\u2714\ufe0f|\u2714\ufe0f|\u2714\ufe0f | [Link](https://proceedings.neurips.cc/paper/2021/file/2b38c2df6a49b97f706ec9148ce48d86-Paper.pdf) |29\u2b50|\n|DAAC|\u274c| \u2714\ufe0f |\u2714\ufe0f|\u2714\ufe0f|\u2714\ufe0f|\ud83d\udc0c | [Link](http://proceedings.mlr.press/v139/raileanu21a/raileanu21a.pdf) |56\u2b50|\n|PPG|\u274c| \u2714\ufe0f |\u2714\ufe0f|\u274c|\u2714\ufe0f|\ud83d\udc0c| [Link](http://proceedings.mlr.press/v139/cobbe21a/cobbe21a.pdf) |82\u2b50|\n|IMPALA|\u2714\ufe0f| \u2714\ufe0f |\u2714\ufe0f|\u274c|\u2714\ufe0f|\ud83d\udc0c| [Link](http://proceedings.mlr.press/v80/espeholt18a/espeholt18a.pdf) |1219\u2b50|\n\n> - \ud83d\udc0c: Developing.\n> - `NPU`: Support Neural-network processing unit.\n> - `Recurrent`: Support recurrent neural network.\n> - `Box`: A N-dimensional box that containes every point in the action space.\n> - `Discrete`: A list of possible actions, where each timestep only one of the actions can be used.\n> - `MultiBinary`: A list of possible actions, where each timestep any of the actions can be used in any combination.\n\n## Intrinsic Reward Modules\n| Module | Remark | Repr.  | Visual | Reference | \n|:-|:-|:-|:-|:-|\n| PseudoCounts | Count-Based exploration |\u2714\ufe0f|\u2714\ufe0f|[Never Give Up: Learning Directed Exploration Strategies](https://arxiv.org/pdf/2002.06038) |\n| ICM  | Curiosity-driven exploration  | \u2714\ufe0f|\u2714\ufe0f| [Curiosity-Driven Exploration by Self-Supervised Prediction](http://proceedings.mlr.press/v70/pathak17a/pathak17a.pdf) | \n| RND  | Count-based exploration  | \u274c|\u2714\ufe0f| [Exploration by Random Network Distillation](https://arxiv.org/pdf/1810.12894.pdf) | \n| GIRM | Curiosity-driven exploration  | \u2714\ufe0f |\u2714\ufe0f| [Intrinsic Reward Driven Imitation Learning via Generative Model](http://proceedings.mlr.press/v119/yu20d/yu20d.pdf)|\n| NGU | Memory-based exploration  | \u2714\ufe0f  |\u2714\ufe0f| [Never Give Up: Learning Directed Exploration Strategies](https://arxiv.org/pdf/2002.06038) | \n| RIDE| Procedurally-generated environment | \u2714\ufe0f |\u2714\ufe0f| [RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments](https://arxiv.org/pdf/2002.12292)|\n| RE3  | Entropy Maximization | \u274c |\u2714\ufe0f| [State Entropy Maximization with Random Encoders for Efficient Exploration](http://proceedings.mlr.press/v139/seo21a/seo21a.pdf) |\n| RISE  | Entropy Maximization  | \u274c  |\u2714\ufe0f| [R\u00e9nyi State Entropy Maximization for Exploration Acceleration in Reinforcement Learning](https://ieeexplore.ieee.org/abstract/document/9802917/) | \n| REVD  | Divergence Maximization | \u274c  |\u2714\ufe0f| [Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning](https://openreview.net/pdf?id=V2pw1VYMrDo)|\n\n> - \ud83d\udc0c: Developing.\n> - `Repr.`: The method involves representation learning.\n> - `Visual`: The method works well in visual RL.\n\nSee [Tutorials: Use intrinsic reward and observation augmentation](https://docs.hsuanwu.dev/tutorials/data_augmentation.md) for usage examples.\n\n# Model Zoo\nHsuanwu provides a large number of reusable bechmarks, see [https://hub.hsuanwu.dev/](https://hub.hsuanwu.dev/) and [https://docs.hsuanwu.dev/benchmarks/](https://docs.hsuanwu.dev/benchmarks/)\n\n# API Documentation\nView our well-designed documentation: [https://docs.hsuanwu.dev/](https://docs.hsuanwu.dev/)\n\n# How To Contribute\nWelcome to contribute to this project! Before you begin writing code, please read [CONTRIBUTING.md](https://github.com/RLE-Foundation/Hsuanwu/blob/main/CONTRIBUTING.md) for guide first.\n\n# Acknowledgment\nThis project is supported by [FUNDING.yml](https://github.com/RLE-Foundation/Hsuanwu/blob/main/.github/FUNDING.yml). Some code of this project is borrowed or inspired by several excellent projects, and we highly appreciate them. See [ACKNOWLEDGMENT.md](https://github.com/RLE-Foundation/Hsuanwu/blob/main/ACKNOWLEDGMENT.md).\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Long-Term Evolution Project of Reinforcement Learning",
    "version": "0.0.1b7",
    "project_urls": {
        "Benchmark": "https://hub.hsuanwu.dev/",
        "Bug Tracker": "https://github.com/RLE-Foundation/Hsuanwu/issues",
        "Code": "https://github.com/RLE-Foundation/Hsuanwu",
        "Documentation": "https://docs.hsuanwu.dev/"
    },
    "split_keywords": [
        "algorithm",
        "baseline",
        "evolution",
        "reinforcement learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "21045491d3417c876d1aaa43f22a0eb895c3fea7c8c49b7f118d1cdfff2829fd",
                "md5": "3f9062a06725f5c8c9d3cc3737a1f707",
                "sha256": "d160cd8b2a9a68778ef4c9ea9c13661feedaf03c3751047af905039bd61fd456"
            },
            "downloads": -1,
            "filename": "hsuanwu-0.0.1b7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3f9062a06725f5c8c9d3cc3737a1f707",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 157799,
            "upload_time": "2023-05-15T05:05:53",
            "upload_time_iso_8601": "2023-05-15T05:05:53.320391Z",
            "url": "https://files.pythonhosted.org/packages/21/04/5491d3417c876d1aaa43f22a0eb895c3fea7c8c49b7f118d1cdfff2829fd/hsuanwu-0.0.1b7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2b555cd6c2ebe250a927af0ebde96ffe201d8368911fca74d9c197fb98e07981",
                "md5": "bb3dbaffe15663916749bfba1802d37e",
                "sha256": "93ff4ec6fed30cd9e8fa1e4bcc16de94757529a22af5d8a6552d4ef3c497b0ea"
            },
            "downloads": -1,
            "filename": "hsuanwu-0.0.1b7.tar.gz",
            "has_sig": false,
            "md5_digest": "bb3dbaffe15663916749bfba1802d37e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 12539105,
            "upload_time": "2023-05-15T05:05:59",
            "upload_time_iso_8601": "2023-05-15T05:05:59.711471Z",
            "url": "https://files.pythonhosted.org/packages/2b/55/5cd6c2ebe250a927af0ebde96ffe201d8368911fca74d9c197fb98e07981/hsuanwu-0.0.1b7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-15 05:05:59",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "RLE-Foundation",
    "github_project": "Hsuanwu",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "hsuanwu"
}
        
Elapsed time: 0.06643s