rllte-core

Name	rllte-core JSON
Version	1.0.1 JSON
	download
home_page	None
Summary	Long-Term Evolution Project of Reinforcement Learning
upload_time	2024-05-25 16:44:55
maintainer	None
docs_url	None
author	None
requires_python	>=3.8
license	None
keywords	algorithm baseline evolution reinforcement learning
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <div align=center>
<br>
<img src='./docs/assets/images/logo_horizontal.svg' style="width: 55%">
<br>

## RLLTE: Long-Term Evolution Project of Reinforcement Learning

<!-- <h3> <a href="https://arxiv.org/pdf/2309.16382.pdf"> Paper </a> |
<a href="https://docs.rllte.dev/api/"> Documentation </a> |
<a href="https://docs.rllte.dev/tutorials/"> Tutorials </a> |
<a href="https://github.com/RLE-Foundation/rllte/discussions"> Forum </a> |
<a href="https://hub.rllte.dev/"> Benchmarks </a></h3> -->

<img src="https://img.shields.io/badge/License-MIT-%230677b8"> <img src="https://img.shields.io/badge/GPU-NVIDIA-%2377b900"> <img src="https://img.shields.io/badge/NPU-Ascend-%23c31d20"> <img src="https://img.shields.io/badge/Python-%3E%3D3.8-%2335709F"> <img src="https://img.shields.io/badge/Docs-Passing-%23009485"> <img src="https://img.shields.io/badge/Codestyle-Black-black"> <img src="https://img.shields.io/badge/PyPI-0.0.1-%23006DAD"> 

<!-- <img src="https://img.shields.io/badge/Coverage-97.00%25-green">  -->

<!-- | [English](README.md) | [中文](docs/README-zh-Hans.md) | -->

</div>

<!-- # Contents
- [Overview](#overview)
- [Quick Start](#quick-start)
  + [Installation](#installation)
  + [Fast Training with Built-in Algorithms](#fast-training-with-built-in-algorithms)
    - [On NVIDIA GPU](#on-nvidia-gpu)
    - [On HUAWEI NPU](#on-huawei-npu)
  + [Three Steps to Create Your RL Agent](#three-steps-to-create-your-rl-agent)
  + [Algorithm Decoupling and Module Replacement](#algorithm-decoupling-and-module-replacement)
- [Function List (Part)](#function-list-part)
  + [RL Agents](#rl-agents)
  + [Intrinsic Reward Modules](#intrinsic-reward-modules)
- [RLLTE Ecosystem](#rllte-ecosystem)
- [API Documentation](#api-documentation)
- [Cite the Project](#cite-the-project)
- [How To Contribute](#how-to-contribute)
- [Acknowledgment](#acknowledgment)
- [Miscellaneous](#miscellaneous) -->

<!-- # Overview -->
Inspired by the long-term evolution (LTE) standard project in telecommunications, aiming to provide development components for and standards for advancing RL research and applications. Beyond delivering top-notch algorithm implementations, **RLLTE** also serves as a **toolkit** for developing algorithms.

<!-- <div align="center">
<a href="https://youtu.be/PMF6fa72bmE" rel="nofollow">
<img src='./docs/assets/images/youtube.png' style="width: 70%">
</a>
<br>
An introduction to RLLTE.
</div> -->

Why **RLLTE**?
- 🧬 Long-term evolution for providing latest algorithms and tricks;
- 🏞️ Complete ecosystem for task design, model training, evaluation, and deployment (TensorRT, CANN, ...);
- 🧱 Module-oriented design for complete decoupling of RL algorithms;
- 🚀 Optimized workflow for full hardware acceleration;
- ⚙️ Support custom environments and modules;
- 🖥️ Support multiple computing devices like GPU and NPU;
- 💾 Large number of reusable benchmarks ([RLLTE Hub](https://hub.rllte.dev));
- 🤖 Large language model-empowered copilot ([RLLTE Copilot](https://github.com/RLE-Foundation/rllte-copilot)).

> ⚠️ Since the construction of RLLTE Hub requires massive computing power, we have to upload the training datasets and model weights gradually. Progress report can be found in [Issue#30](https://github.com/RLE-Foundation/rllte/issues/30).

See the project structure below:
<div align=center>
<img src='./docs/assets/images/structure.svg' style="width: 100%">
</div>

For more detailed descriptions of these modules, see [API Documentation](https://docs.rllte.dev/api).

# Quick Start
## Installation
- with pip `recommended`

Open a terminal and install **rllte** with `pip`:
``` shell
conda create -n rllte python=3.8 # create an virtual environment
pip install rllte-core # basic installation
pip install rllte-core[envs] # for pre-defined environments
```

- with git

Open a terminal and clone the repository from [GitHub](https://github.com/RLE-Foundation/rllte) with `git`:
``` sh
git clone https://github.com/RLE-Foundation/rllte.git
pip install -e . # basic installation
pip install -e .[envs] # for pre-defined environments
```

For more detailed installation instruction, see [Getting Started](https://docs.rllte.dev/getting_started).

## Fast Training with Built-in Algorithms
**RLLTE** provides implementations for well-recognized RL algorithms and simple interface for building applications.
### On NVIDIA GPU
Suppose we want to use [DrQ-v2](https://openreview.net/forum?id=_SJ-_yyes8) to solve a task of [DeepMind Control Suite](https://github.com/deepmind/dm_control), and it suffices to write a `train.py` like:

``` python
# import `env` and `agent` module
from rllte.env import make_dmc_env 
from rllte.agent import DrQv2

if __name__ == "__main__":
    device = "cuda:0"
    # create env, `eval_env` is optional
    env = make_dmc_env(env_id="cartpole_balance", device=device)
    eval_env = make_dmc_env(env_id="cartpole_balance", device=device)
    # create agent
    agent = DrQv2(env=env, eval_env=eval_env, device=device, tag="drqv2_dmc_pixel")
    # start training
    agent.train(num_train_steps=500000, log_interval=1000)
```
Run `train.py` and you will see the following output:

<div align=center>
<img src='./docs/assets/images/rl_training_gpu.gif' style="filter: drop-shadow(0px 0px 7px #000);">
</div>

### On HUAWEI NPU
Similarly, if we want to train an agent on HUAWEI NPU, it suffices to replace `cuda` with `npu`:
``` python
device = "cuda:0" -> device = "npu:0"
```

## Three Steps to Create Your RL Agent


Developers only need three steps to implement an RL algorithm with **RLLTE**. The following example illustrates how to write an Advantage Actor-Critic (A2C) agent to solve Atari games. 
- Firstly, select a prototype:
  <details>
        <summary>Click to expand code</summary>
  ``` py
  from rllte.common.prototype import OnPolicyAgent
  ```
  </details>
- Secondly, select necessary modules to build the agent:

  <details>
      <summary>Click to expand code</summary>

  ``` py
  from rllte.xploit.encoder import MnihCnnEncoder
  from rllte.xploit.policy import OnPolicySharedActorCritic
  from rllte.xploit.storage import VanillaRolloutStorage
  from rllte.xplore.distribution import Categorical
  ```
  - Run the `.describe` function of the selected policy and you will see the following output:
  ``` py
  OnPolicySharedActorCritic.describe()
  # Output:
  # ================================================================================
  # Name       : OnPolicySharedActorCritic
  # Structure  : self.encoder (shared by actor and critic), self.actor, self.critic
  # Forward    : obs -> self.encoder -> self.actor -> actions
  #            : obs -> self.encoder -> self.critic -> values
  #            : actions -> log_probs
  # Optimizers : self.optimizers['opt'] -> (self.encoder, self.actor, self.critic)
  # ================================================================================
  ```
  This illustrates the structure of the policy and indicate the optimizable parts.

  </details>

- Thirdly, merge these modules and write an `.update` function:

  <details>
      <summary>Click to expand code</summary>

  ``` py
  from torch import nn
  import torch as th

  class A2C(OnPolicyAgent):
      def __init__(self, env, tag, seed, device, num_steps) -> None:
          super().__init__(env=env, tag=tag, seed=seed, device=device, num_steps=num_steps)
          # create modules
          encoder = MnihCnnEncoder(observation_space=env.observation_space, feature_dim=512)
          policy = OnPolicySharedActorCritic(observation_space=env.observation_space,
                                            action_space=env.action_space,
                                            feature_dim=512,
                                            opt_class=th.optim.Adam,
                                            opt_kwargs=dict(lr=2.5e-4, eps=1e-5),
                                            init_fn="xavier_uniform"
                                            )
          storage = VanillaRolloutStorage(observation_space=env.observation_space,
                                          action_space=env.action_space,
                                          device=device,
                                          storage_size=self.num_steps,
                                          num_envs=self.num_envs,
                                          batch_size=256
                                          )
          dist = Categorical()
          # set all the modules
          self.set(encoder=encoder, policy=policy, storage=storage, distribution=dist)
      
      def update(self):
          for _ in range(4):
              for batch in self.storage.sample():
                  # evaluate the sampled actions
                  new_values, new_log_probs, entropy = self.policy.evaluate_actions(obs=batch.observations, actions=batch.actions)
                  # policy loss part
                  policy_loss = - (batch.adv_targ * new_log_probs).mean()
                  # value loss part
                  value_loss = 0.5 * (new_values.flatten() - batch.returns).pow(2).mean()
                  # update
                  self.policy.optimizers['opt'].zero_grad(set_to_none=True)
                  (value_loss * 0.5 + policy_loss - entropy * 0.01).backward()
                  nn.utils.clip_grad_norm_(self.policy.parameters(), 0.5)
                  self.policy.optimizers['opt'].step()
  ```

  </details>

- Finally, train the agent by
  <details>
        <summary>Click to expand code</summary>
  ``` py
  from rllte.env import make_atari_env
  if __name__ == "__main__":
      device = "cuda"
      env = make_atari_env("PongNoFrameskip-v4", num_envs=8, seed=0, device=device)
      agent = A2C(env=env, tag="a2c_atari", seed=0, device=device, num_steps=128)
      agent.train(num_train_steps=10000000)
  ```
  </details>

As shown in this example, only a few dozen lines of code are needed to create RL agents with **RLLTE**. 

## Algorithm Decoupling and Module Replacement
**RLLTE** allows developers to replace settled modules of implemented algorithms to make performance comparison and algorithm improvement, and both 
built-in and custom modules are supported. Suppose we want to compare the effect of different encoders, it suffices to invoke the `.set` function:
``` py
from rllte.xploit.encoder import EspeholtResidualEncoder
encoder = EspeholtResidualEncoder(...)
agent.set(encoder=encoder)
```
**RLLTE** is an extremely open framework that allows developers to try anything. For more detailed tutorials, see [Tutorials](https://docs.rllte.dev/tutorials).

# Function List (Part)
## RL Agents
|     Type    |  Algo. | Box | Dis. | M.B. | M.D. | M.P. | NPU |💰|🔭|
|:-----------:|:------:|:---:|:----:|:----:|:----:|:------:|:---:|:------:|:---:|
| On-Policy   | [A2C](https://arxiv.org/abs/1602.01783)    | ✔️   | ✔️    | ✔️    | ✔️    | ✔️    | ✔️   |✔️    |❌    |
| On-Policy   | [PPO](https://arxiv.org/pdf/1707.06347)    | ✔️   | ✔️    | ✔️    | ✔️    | ✔️    | ✔️   |✔️    |❌    |
| On-Policy   | [DrAC](https://proceedings.neurips.cc/paper/2021/file/2b38c2df6a49b97f706ec9148ce48d86-Paper.pdf)| ✔️   | ✔️    | ✔️    | ✔️    | ✔️    | ✔️   |✔️    | ✔️   |
| On-Policy   | [DAAC](http://proceedings.mlr.press/v139/raileanu21a/raileanu21a.pdf)| ✔️   | ✔️    | ✔️    | ✔️    | ✔️    | ✔️   |✔️    | ❌   |
| On-Policy   | [DrDAAC](https://proceedings.neurips.cc/paper/2021/file/2b38c2df6a49b97f706ec9148ce48d86-Paper.pdf)| ✔️   | ✔️    | ✔️    | ✔️    | ✔️    | ✔️   |✔️    | ✔️   |
| On-Policy   | [PPG](http://proceedings.mlr.press/v139/cobbe21a/cobbe21a.pdf)| ✔️   | ✔️    | ✔️    |  ❌   | ✔️    | ✔️   |✔️    | ❌   |
| Off-Policy  | [DQN](https://training.incf.org/sites/default/files/2023-05/Human-level%20control%20through%20deep%20reinforcement%20learning.pdf) | ✔️   | ❌    | ❌    | ❌    | ✔️    | ✔️   |✔️    | ❌   |
| Off-Policy  | [DDPG](https://arxiv.org/pdf/1509.02971.pdf?source=post_page---------------------------)| ✔️   | ❌    | ❌    | ❌    | ✔️    | ✔️   |✔️    |❌    |
| Off-Policy  | [SAC](http://proceedings.mlr.press/v80/haarnoja18b/haarnoja18b.pdf)| ✔️   | ❌    | ❌    | ❌    | ✔️    | ✔️   |✔️    |❌    |
| Off-Policy  | [SAC-Discrete](https://arxiv.org/abs/1910.07207)|  ❌  | ✔️    | ❌    | ❌    | ✔️    | ✔️   |✔️    |❌    |
| Off-Policy  | [TD3](http://proceedings.mlr.press/v80/fujimoto18a/fujimoto18a.pdf)| ✔️   | ❌    | ❌    | ❌    | ✔️    | ✔️   |✔️    |❌    |
| Off-Policy  | [DrQ-v2](https://arxiv.org/pdf/2107.09645.pdf?utm_source=morioh.com)| ✔️   | ❌    | ❌    | ❌    | ❌    | ✔️   |✔️    |✔️    |
| Distributed | [IMPALA](http://proceedings.mlr.press/v80/espeholt18a/espeholt18a.pdf) | ✔️   | ✔️    | ❌    | ❌    | ✔️    | ❌   |❌    |❌    |

> - `Dis., M.B., M.D.`: `Discrete`, `MultiBinary`, and `MultiDiscrete` action space;
> - `M.P.`: Multi processing;
> - 🐌: Developing;
> - 💰: Support intrinsic reward shaping;
> - 🔭: Support observation augmentation. 


## Intrinsic Reward Modules
| **Type** 	| **Modules** 	|
|---	|---	|
| Count-based 	| [PseudoCounts](https://arxiv.org/pdf/2002.06038), [RND](https://arxiv.org/pdf/1810.12894.pdf), [E3B](https://proceedings.neurips.cc/paper_files/paper/2022/file/f4f79698d48bdc1a6dec20583724182b-Paper-Conference.pdf) 	|
| Curiosity-driven 	| [ICM](http://proceedings.mlr.press/v70/pathak17a/pathak17a.pdf), [GIRM](http://proceedings.mlr.press/v119/yu20d/yu20d.pdf), [RIDE](https://arxiv.org/pdf/2002.12292), [Disagreement](https://arxiv.org/pdf/1906.04161.pdf) 	|
| Memory-based 	| [NGU](https://arxiv.org/pdf/2002.06038) 	|
| Information theory-based 	| [RE3](http://proceedings.mlr.press/v139/seo21a/seo21a.pdf), [RISE](https://ieeexplore.ieee.org/abstract/document/9802917/), [REVD](https://openreview.net/pdf?id=V2pw1VYMrDo) 	|

See [Tutorials: Use Intrinsic Reward and Observation Augmentation](https://docs.rllte.dev/tutorials/data_augmentation) for usage examples.

# RLLTE Ecosystem
Explore the ecosystem of RLLTE to facilitate your project:

- [Hub](https://docs.rllte.dev/benchmarks/): Fast training APIs and reusable benchmarks.
- [Evaluation](https://docs.rllte.dev/api/tutorials/): Reasonable and reliable metrics for algorithm evaluation.
- [Env](https://docs.rllte.dev/api/tutorials/): Packaged environments for fast invocation.
- [Deployment](https://docs.rllte.dev/api/tutorials/): Convenient APIs for model deployment.
- [Pre-training](https://docs.rllte.dev/api/tutorials/): Methods of pre-training in RL.
- [Copilot](https://docs.rllte.dev/copilot): Large language model-empowered copilot.

<!-- # API Documentation
View our well-designed documentation: [https://docs.rllte.dev/](https://docs.rllte.dev/)
<div align=center>
<img src='./docs/assets/images/docs.gif' style="width: 100%">
</div> -->

# How To Contribute
Welcome to contribute to this project! Before you begin writing code, please read [CONTRIBUTING.md](https://github.com/RLE-Foundation/rllte/blob/main/CONTRIBUTING.md) for guide first.

# Cite the Project
To cite this project in publications:
```bibtex
@article{yuan2023rllte,
  title={RLLTE: Long-Term Evolution Project of Reinforcement Learning}, 
  author={Mingqi Yuan and Zequn Zhang and Yang Xu and Shihao Luo and Bo Li and Xin Jin and Wenjun Zeng},
  year={2023},
  journal={arXiv preprint arXiv:2309.16382}
}
```

# Acknowledgment
This project is supported by [The Hong Kong Polytechnic University](http://www.polyu.edu.hk/), [Eastern Institute for Advanced Study](http://www.eias.ac.cn/), and [FLW-Foundation](FLW-Foundation). [EIAS HPC](https://hpc.eias.ac.cn/) provides a GPU computing platform, and [HUAWEI Ascend Community](https://www.hiascend.com/) provides an NPU computing platform for our testing. Some code of this project is borrowed or inspired by several excellent projects, and we highly appreciate them. See [ACKNOWLEDGMENT.md](https://github.com/RLE-Foundation/rllte/blob/main/ACKNOWLEDGMENT.md).

<!-- # Miscellaneous

## &#8627; Stargazers, thanks for your support!
[![Stargazers repo roster for @RLE-Foundation/rllte](https://reporoster.com/stars/RLE-Foundation/rllte)](https://github.com/RLE-Foundation/rllte/stargazers)

## &#8627; Forkers, thanks for your support!
[![Forkers repo roster for @RLE-Foundation/rllte](https://reporoster.com/forks/RLE-Foundation/rllte)](https://github.com/RLE-Foundation/rllte/network/members)

## &#8627; Star History
<div align="center">

[![Star History Chart](https://api.star-history.com/svg?repos=RLE-Foundation/rllte&type=Date)](https://star-history.com/#RLE-Foundation/rllte&Date)

</div> -->

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "rllte-core",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "Algorithm, Baseline, Evolution, Reinforcement Learning",
    "author": null,
    "author_email": "Reinforcement Learning Evolution Foundation <friedrichyuan19990827@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/b8/88/f719279305e8ac6d7aaf085eb15bff3e3377ef0dbc6b3b7a0d4dbf453d93/rllte_core-1.0.1.tar.gz",
    "platform": null,
    "description": "<div align=center>\n<br>\n<img src='./docs/assets/images/logo_horizontal.svg' style=\"width: 55%\">\n<br>\n\n## RLLTE: Long-Term Evolution Project of Reinforcement Learning\n\n<!-- <h3> <a href=\"https://arxiv.org/pdf/2309.16382.pdf\"> Paper </a> |\n<a href=\"https://docs.rllte.dev/api/\"> Documentation </a> |\n<a href=\"https://docs.rllte.dev/tutorials/\"> Tutorials </a> |\n<a href=\"https://github.com/RLE-Foundation/rllte/discussions\"> Forum </a> |\n<a href=\"https://hub.rllte.dev/\"> Benchmarks </a></h3> -->\n\n<img src=\"https://img.shields.io/badge/License-MIT-%230677b8\"> <img src=\"https://img.shields.io/badge/GPU-NVIDIA-%2377b900\"> <img src=\"https://img.shields.io/badge/NPU-Ascend-%23c31d20\"> <img src=\"https://img.shields.io/badge/Python-%3E%3D3.8-%2335709F\"> <img src=\"https://img.shields.io/badge/Docs-Passing-%23009485\"> <img src=\"https://img.shields.io/badge/Codestyle-Black-black\"> <img src=\"https://img.shields.io/badge/PyPI-0.0.1-%23006DAD\"> \n\n<!-- <img src=\"https://img.shields.io/badge/Coverage-97.00%25-green\">  -->\n\n<!-- | [English](README.md) | [\u4e2d\u6587](docs/README-zh-Hans.md) | -->\n\n</div>\n\n<!-- # Contents\n- [Overview](#overview)\n- [Quick Start](#quick-start)\n  + [Installation](#installation)\n  + [Fast Training with Built-in Algorithms](#fast-training-with-built-in-algorithms)\n    - [On NVIDIA GPU](#on-nvidia-gpu)\n    - [On HUAWEI NPU](#on-huawei-npu)\n  + [Three Steps to Create Your RL Agent](#three-steps-to-create-your-rl-agent)\n  + [Algorithm Decoupling and Module Replacement](#algorithm-decoupling-and-module-replacement)\n- [Function List (Part)](#function-list-part)\n  + [RL Agents](#rl-agents)\n  + [Intrinsic Reward Modules](#intrinsic-reward-modules)\n- [RLLTE Ecosystem](#rllte-ecosystem)\n- [API Documentation](#api-documentation)\n- [Cite the Project](#cite-the-project)\n- [How To Contribute](#how-to-contribute)\n- [Acknowledgment](#acknowledgment)\n- [Miscellaneous](#miscellaneous) -->\n\n<!-- # Overview -->\nInspired by the long-term evolution (LTE) standard project in telecommunications, aiming to provide development components for and standards for advancing RL research and applications. Beyond delivering top-notch algorithm implementations, **RLLTE** also serves as a **toolkit** for developing algorithms.\n\n<!-- <div align=\"center\">\n<a href=\"https://youtu.be/PMF6fa72bmE\" rel=\"nofollow\">\n<img src='./docs/assets/images/youtube.png' style=\"width: 70%\">\n</a>\n<br>\nAn introduction to RLLTE.\n</div> -->\n\nWhy **RLLTE**?\n- \ud83e\uddec Long-term evolution for providing latest algorithms and tricks;\n- \ud83c\udfde\ufe0f Complete ecosystem for task design, model training, evaluation, and deployment (TensorRT, CANN, ...);\n- \ud83e\uddf1 Module-oriented design for complete decoupling of RL algorithms;\n- \ud83d\ude80 Optimized workflow for full hardware acceleration;\n- \u2699\ufe0f Support custom environments and modules;\n- \ud83d\udda5\ufe0f Support multiple computing devices like GPU and NPU;\n- \ud83d\udcbe Large number of reusable benchmarks ([RLLTE Hub](https://hub.rllte.dev));\n- \ud83e\udd16 Large language model-empowered copilot ([RLLTE Copilot](https://github.com/RLE-Foundation/rllte-copilot)).\n\n> \u26a0\ufe0f Since the construction of RLLTE Hub requires massive computing power, we have to upload the training datasets and model weights gradually. Progress report can be found in [Issue#30](https://github.com/RLE-Foundation/rllte/issues/30).\n\nSee the project structure below:\n<div align=center>\n<img src='./docs/assets/images/structure.svg' style=\"width: 100%\">\n</div>\n\nFor more detailed descriptions of these modules, see [API Documentation](https://docs.rllte.dev/api).\n\n# Quick Start\n## Installation\n- with pip `recommended`\n\nOpen a terminal and install **rllte** with `pip`:\n``` shell\nconda create -n rllte python=3.8 # create an virtual environment\npip install rllte-core # basic installation\npip install rllte-core[envs] # for pre-defined environments\n```\n\n- with git\n\nOpen a terminal and clone the repository from [GitHub](https://github.com/RLE-Foundation/rllte) with `git`:\n``` sh\ngit clone https://github.com/RLE-Foundation/rllte.git\npip install -e . # basic installation\npip install -e .[envs] # for pre-defined environments\n```\n\nFor more detailed installation instruction, see [Getting Started](https://docs.rllte.dev/getting_started).\n\n## Fast Training with Built-in Algorithms\n**RLLTE** provides implementations for well-recognized RL algorithms and simple interface for building applications.\n### On NVIDIA GPU\nSuppose we want to use [DrQ-v2](https://openreview.net/forum?id=_SJ-_yyes8) to solve a task of [DeepMind Control Suite](https://github.com/deepmind/dm_control), and it suffices to write a `train.py` like:\n\n``` python\n# import `env` and `agent` module\nfrom rllte.env import make_dmc_env \nfrom rllte.agent import DrQv2\n\nif __name__ == \"__main__\":\n    device = \"cuda:0\"\n    # create env, `eval_env` is optional\n    env = make_dmc_env(env_id=\"cartpole_balance\", device=device)\n    eval_env = make_dmc_env(env_id=\"cartpole_balance\", device=device)\n    # create agent\n    agent = DrQv2(env=env, eval_env=eval_env, device=device, tag=\"drqv2_dmc_pixel\")\n    # start training\n    agent.train(num_train_steps=500000, log_interval=1000)\n```\nRun `train.py` and you will see the following output:\n\n<div align=center>\n<img src='./docs/assets/images/rl_training_gpu.gif' style=\"filter: drop-shadow(0px 0px 7px #000);\">\n</div>\n\n### On HUAWEI NPU\nSimilarly, if we want to train an agent on HUAWEI NPU, it suffices to replace `cuda` with `npu`:\n``` python\ndevice = \"cuda:0\" -> device = \"npu:0\"\n```\n\n## Three Steps to Create Your RL Agent\n\n\nDevelopers only need three steps to implement an RL algorithm with **RLLTE**. The following example illustrates how to write an Advantage Actor-Critic (A2C) agent to solve Atari games. \n- Firstly, select a prototype:\n  <details>\n        <summary>Click to expand code</summary>\n  ``` py\n  from rllte.common.prototype import OnPolicyAgent\n  ```\n  </details>\n- Secondly, select necessary modules to build the agent:\n\n  <details>\n      <summary>Click to expand code</summary>\n\n  ``` py\n  from rllte.xploit.encoder import MnihCnnEncoder\n  from rllte.xploit.policy import OnPolicySharedActorCritic\n  from rllte.xploit.storage import VanillaRolloutStorage\n  from rllte.xplore.distribution import Categorical\n  ```\n  - Run the `.describe` function of the selected policy and you will see the following output:\n  ``` py\n  OnPolicySharedActorCritic.describe()\n  # Output:\n  # ================================================================================\n  # Name       : OnPolicySharedActorCritic\n  # Structure  : self.encoder (shared by actor and critic), self.actor, self.critic\n  # Forward    : obs -> self.encoder -> self.actor -> actions\n  #            : obs -> self.encoder -> self.critic -> values\n  #            : actions -> log_probs\n  # Optimizers : self.optimizers['opt'] -> (self.encoder, self.actor, self.critic)\n  # ================================================================================\n  ```\n  This illustrates the structure of the policy and indicate the optimizable parts.\n\n  </details>\n\n- Thirdly, merge these modules and write an `.update` function:\n\n  <details>\n      <summary>Click to expand code</summary>\n\n  ``` py\n  from torch import nn\n  import torch as th\n\n  class A2C(OnPolicyAgent):\n      def __init__(self, env, tag, seed, device, num_steps) -> None:\n          super().__init__(env=env, tag=tag, seed=seed, device=device, num_steps=num_steps)\n          # create modules\n          encoder = MnihCnnEncoder(observation_space=env.observation_space, feature_dim=512)\n          policy = OnPolicySharedActorCritic(observation_space=env.observation_space,\n                                            action_space=env.action_space,\n                                            feature_dim=512,\n                                            opt_class=th.optim.Adam,\n                                            opt_kwargs=dict(lr=2.5e-4, eps=1e-5),\n                                            init_fn=\"xavier_uniform\"\n                                            )\n          storage = VanillaRolloutStorage(observation_space=env.observation_space,\n                                          action_space=env.action_space,\n                                          device=device,\n                                          storage_size=self.num_steps,\n                                          num_envs=self.num_envs,\n                                          batch_size=256\n                                          )\n          dist = Categorical()\n          # set all the modules\n          self.set(encoder=encoder, policy=policy, storage=storage, distribution=dist)\n      \n      def update(self):\n          for _ in range(4):\n              for batch in self.storage.sample():\n                  # evaluate the sampled actions\n                  new_values, new_log_probs, entropy = self.policy.evaluate_actions(obs=batch.observations, actions=batch.actions)\n                  # policy loss part\n                  policy_loss = - (batch.adv_targ * new_log_probs).mean()\n                  # value loss part\n                  value_loss = 0.5 * (new_values.flatten() - batch.returns).pow(2).mean()\n                  # update\n                  self.policy.optimizers['opt'].zero_grad(set_to_none=True)\n                  (value_loss * 0.5 + policy_loss - entropy * 0.01).backward()\n                  nn.utils.clip_grad_norm_(self.policy.parameters(), 0.5)\n                  self.policy.optimizers['opt'].step()\n  ```\n\n  </details>\n\n- Finally, train the agent by\n  <details>\n        <summary>Click to expand code</summary>\n  ``` py\n  from rllte.env import make_atari_env\n  if __name__ == \"__main__\":\n      device = \"cuda\"\n      env = make_atari_env(\"PongNoFrameskip-v4\", num_envs=8, seed=0, device=device)\n      agent = A2C(env=env, tag=\"a2c_atari\", seed=0, device=device, num_steps=128)\n      agent.train(num_train_steps=10000000)\n  ```\n  </details>\n\nAs shown in this example, only a few dozen lines of code are needed to create RL agents with **RLLTE**. \n\n## Algorithm Decoupling and Module Replacement\n**RLLTE** allows developers to replace settled modules of implemented algorithms to make performance comparison and algorithm improvement, and both \nbuilt-in and custom modules are supported. Suppose we want to compare the effect of different encoders, it suffices to invoke the `.set` function:\n``` py\nfrom rllte.xploit.encoder import EspeholtResidualEncoder\nencoder = EspeholtResidualEncoder(...)\nagent.set(encoder=encoder)\n```\n**RLLTE** is an extremely open framework that allows developers to try anything. For more detailed tutorials, see [Tutorials](https://docs.rllte.dev/tutorials).\n\n# Function List (Part)\n## RL Agents\n|     Type    |  Algo. | Box | Dis. | M.B. | M.D. | M.P. | NPU |\ud83d\udcb0|\ud83d\udd2d|\n|:-----------:|:------:|:---:|:----:|:----:|:----:|:------:|:---:|:------:|:---:|\n| On-Policy   | [A2C](https://arxiv.org/abs/1602.01783)    | \u2714\ufe0f   | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f   |\u2714\ufe0f    |\u274c    |\n| On-Policy   | [PPO](https://arxiv.org/pdf/1707.06347)    | \u2714\ufe0f   | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f   |\u2714\ufe0f    |\u274c    |\n| On-Policy   | [DrAC](https://proceedings.neurips.cc/paper/2021/file/2b38c2df6a49b97f706ec9148ce48d86-Paper.pdf)| \u2714\ufe0f   | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f   |\u2714\ufe0f    | \u2714\ufe0f   |\n| On-Policy   | [DAAC](http://proceedings.mlr.press/v139/raileanu21a/raileanu21a.pdf)| \u2714\ufe0f   | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f   |\u2714\ufe0f    | \u274c   |\n| On-Policy   | [DrDAAC](https://proceedings.neurips.cc/paper/2021/file/2b38c2df6a49b97f706ec9148ce48d86-Paper.pdf)| \u2714\ufe0f   | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f   |\u2714\ufe0f    | \u2714\ufe0f   |\n| On-Policy   | [PPG](http://proceedings.mlr.press/v139/cobbe21a/cobbe21a.pdf)| \u2714\ufe0f   | \u2714\ufe0f    | \u2714\ufe0f    |  \u274c   | \u2714\ufe0f    | \u2714\ufe0f   |\u2714\ufe0f    | \u274c   |\n| Off-Policy  | [DQN](https://training.incf.org/sites/default/files/2023-05/Human-level%20control%20through%20deep%20reinforcement%20learning.pdf) | \u2714\ufe0f   | \u274c    | \u274c    | \u274c    | \u2714\ufe0f    | \u2714\ufe0f   |\u2714\ufe0f    | \u274c   |\n| Off-Policy  | [DDPG](https://arxiv.org/pdf/1509.02971.pdf?source=post_page---------------------------)| \u2714\ufe0f   | \u274c    | \u274c    | \u274c    | \u2714\ufe0f    | \u2714\ufe0f   |\u2714\ufe0f    |\u274c    |\n| Off-Policy  | [SAC](http://proceedings.mlr.press/v80/haarnoja18b/haarnoja18b.pdf)| \u2714\ufe0f   | \u274c    | \u274c    | \u274c    | \u2714\ufe0f    | \u2714\ufe0f   |\u2714\ufe0f    |\u274c    |\n| Off-Policy  | [SAC-Discrete](https://arxiv.org/abs/1910.07207)|  \u274c  | \u2714\ufe0f    | \u274c    | \u274c    | \u2714\ufe0f    | \u2714\ufe0f   |\u2714\ufe0f    |\u274c    |\n| Off-Policy  | [TD3](http://proceedings.mlr.press/v80/fujimoto18a/fujimoto18a.pdf)| \u2714\ufe0f   | \u274c    | \u274c    | \u274c    | \u2714\ufe0f    | \u2714\ufe0f   |\u2714\ufe0f    |\u274c    |\n| Off-Policy  | [DrQ-v2](https://arxiv.org/pdf/2107.09645.pdf?utm_source=morioh.com)| \u2714\ufe0f   | \u274c    | \u274c    | \u274c    | \u274c    | \u2714\ufe0f   |\u2714\ufe0f    |\u2714\ufe0f    |\n| Distributed | [IMPALA](http://proceedings.mlr.press/v80/espeholt18a/espeholt18a.pdf) | \u2714\ufe0f   | \u2714\ufe0f    | \u274c    | \u274c    | \u2714\ufe0f    | \u274c   |\u274c    |\u274c    |\n\n> - `Dis., M.B., M.D.`: `Discrete`, `MultiBinary`, and `MultiDiscrete` action space;\n> - `M.P.`: Multi processing;\n> - \ud83d\udc0c: Developing;\n> - \ud83d\udcb0: Support intrinsic reward shaping;\n> - \ud83d\udd2d: Support observation augmentation. \n\n\n## Intrinsic Reward Modules\n| **Type** \t| **Modules** \t|\n|---\t|---\t|\n| Count-based \t| [PseudoCounts](https://arxiv.org/pdf/2002.06038), [RND](https://arxiv.org/pdf/1810.12894.pdf), [E3B](https://proceedings.neurips.cc/paper_files/paper/2022/file/f4f79698d48bdc1a6dec20583724182b-Paper-Conference.pdf) \t|\n| Curiosity-driven \t| [ICM](http://proceedings.mlr.press/v70/pathak17a/pathak17a.pdf), [GIRM](http://proceedings.mlr.press/v119/yu20d/yu20d.pdf), [RIDE](https://arxiv.org/pdf/2002.12292), [Disagreement](https://arxiv.org/pdf/1906.04161.pdf) \t|\n| Memory-based \t| [NGU](https://arxiv.org/pdf/2002.06038) \t|\n| Information theory-based \t| [RE3](http://proceedings.mlr.press/v139/seo21a/seo21a.pdf), [RISE](https://ieeexplore.ieee.org/abstract/document/9802917/), [REVD](https://openreview.net/pdf?id=V2pw1VYMrDo) \t|\n\nSee [Tutorials: Use Intrinsic Reward and Observation Augmentation](https://docs.rllte.dev/tutorials/data_augmentation) for usage examples.\n\n# RLLTE Ecosystem\nExplore the ecosystem of RLLTE to facilitate your project:\n\n- [Hub](https://docs.rllte.dev/benchmarks/): Fast training APIs and reusable benchmarks.\n- [Evaluation](https://docs.rllte.dev/api/tutorials/): Reasonable and reliable metrics for algorithm evaluation.\n- [Env](https://docs.rllte.dev/api/tutorials/): Packaged environments for fast invocation.\n- [Deployment](https://docs.rllte.dev/api/tutorials/): Convenient APIs for model deployment.\n- [Pre-training](https://docs.rllte.dev/api/tutorials/): Methods of pre-training in RL.\n- [Copilot](https://docs.rllte.dev/copilot): Large language model-empowered copilot.\n\n<!-- # API Documentation\nView our well-designed documentation: [https://docs.rllte.dev/](https://docs.rllte.dev/)\n<div align=center>\n<img src='./docs/assets/images/docs.gif' style=\"width: 100%\">\n</div> -->\n\n# How To Contribute\nWelcome to contribute to this project! Before you begin writing code, please read [CONTRIBUTING.md](https://github.com/RLE-Foundation/rllte/blob/main/CONTRIBUTING.md) for guide first.\n\n# Cite the Project\nTo cite this project in publications:\n```bibtex\n@article{yuan2023rllte,\n  title={RLLTE: Long-Term Evolution Project of Reinforcement Learning}, \n  author={Mingqi Yuan and Zequn Zhang and Yang Xu and Shihao Luo and Bo Li and Xin Jin and Wenjun Zeng},\n  year={2023},\n  journal={arXiv preprint arXiv:2309.16382}\n}\n```\n\n# Acknowledgment\nThis project is supported by [The Hong Kong Polytechnic University](http://www.polyu.edu.hk/), [Eastern Institute for Advanced Study](http://www.eias.ac.cn/), and [FLW-Foundation](FLW-Foundation). [EIAS HPC](https://hpc.eias.ac.cn/) provides a GPU computing platform, and [HUAWEI Ascend Community](https://www.hiascend.com/) provides an NPU computing platform for our testing. Some code of this project is borrowed or inspired by several excellent projects, and we highly appreciate them. See [ACKNOWLEDGMENT.md](https://github.com/RLE-Foundation/rllte/blob/main/ACKNOWLEDGMENT.md).\n\n<!-- # Miscellaneous\n\n## &#8627; Stargazers, thanks for your support!\n[![Stargazers repo roster for @RLE-Foundation/rllte](https://reporoster.com/stars/RLE-Foundation/rllte)](https://github.com/RLE-Foundation/rllte/stargazers)\n\n## &#8627; Forkers, thanks for your support!\n[![Forkers repo roster for @RLE-Foundation/rllte](https://reporoster.com/forks/RLE-Foundation/rllte)](https://github.com/RLE-Foundation/rllte/network/members)\n\n## &#8627; Star History\n<div align=\"center\">\n\n[![Star History Chart](https://api.star-history.com/svg?repos=RLE-Foundation/rllte&type=Date)](https://star-history.com/#RLE-Foundation/rllte&Date)\n\n</div> -->\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Long-Term Evolution Project of Reinforcement Learning",
    "version": "1.0.1",
    "project_urls": {
        "Benchmark": "https://hub.rllte.dev/",
        "Bug Tracker": "https://github.com/RLE-Foundation/rllte/issues",
        "Code": "https://github.com/RLE-Foundation/rllte",
        "Documentation": "https://docs.rllte.dev/"
    },
    "split_keywords": [
        "algorithm",
        " baseline",
        " evolution",
        " reinforcement learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2d2b77b140abd443a0d222e4e7466949e198e87a161a1f0ff286d998dd2ad000",
                "md5": "00437415ecd0f016e122dfcd19cd52d7",
                "sha256": "3a0ec99c27eae05def49fee79cf868bca298fbf2f7db9c3b90f9be3fd467d6b3"
            },
            "downloads": -1,
            "filename": "rllte_core-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "00437415ecd0f016e122dfcd19cd52d7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 727513,
            "upload_time": "2024-05-25T16:44:50",
            "upload_time_iso_8601": "2024-05-25T16:44:50.710294Z",
            "url": "https://files.pythonhosted.org/packages/2d/2b/77b140abd443a0d222e4e7466949e198e87a161a1f0ff286d998dd2ad000/rllte_core-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b888f719279305e8ac6d7aaf085eb15bff3e3377ef0dbc6b3b7a0d4dbf453d93",
                "md5": "4ac56433a71b35e871507d59b9449bd2",
                "sha256": "e5fc270f7e47d09f301ee154cae735940b4084a054727691eb172ae8e7aa4981"
            },
            "downloads": -1,
            "filename": "rllte_core-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "4ac56433a71b35e871507d59b9449bd2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 13878229,
            "upload_time": "2024-05-25T16:44:55",
            "upload_time_iso_8601": "2024-05-25T16:44:55.276694Z",
            "url": "https://files.pythonhosted.org/packages/b8/88/f719279305e8ac6d7aaf085eb15bff3e3377ef0dbc6b3b7a0d4dbf453d93/rllte_core-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-25 16:44:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "RLE-Foundation",
    "github_project": "rllte",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "rllte-core"
}

None