<div align=center>
<br>
<img src='./docs/assets/images/logo_horizontal.svg' style="width: 55%">
<br>
## RLLTE: Long-Term Evolution Project of Reinforcement Learning
<!-- <h3> <a href="https://arxiv.org/pdf/2309.16382.pdf"> Paper </a> |
<a href="https://docs.rllte.dev/api/"> Documentation </a> |
<a href="https://docs.rllte.dev/tutorials/"> Tutorials </a> |
<a href="https://github.com/RLE-Foundation/rllte/discussions"> Forum </a> |
<a href="https://hub.rllte.dev/"> Benchmarks </a></h3> -->
<img src="https://img.shields.io/badge/License-MIT-%230677b8"> <img src="https://img.shields.io/badge/GPU-NVIDIA-%2377b900"> <img src="https://img.shields.io/badge/NPU-Ascend-%23c31d20"> <img src="https://img.shields.io/badge/Python-%3E%3D3.8-%2335709F"> <img src="https://img.shields.io/badge/Docs-Passing-%23009485"> <img src="https://img.shields.io/badge/Codestyle-Black-black"> <img src="https://img.shields.io/badge/PyPI-0.0.1-%23006DAD">
<!-- <img src="https://img.shields.io/badge/Coverage-97.00%25-green"> -->
<!-- | [English](README.md) | [δΈζ](docs/README-zh-Hans.md) | -->
</div>
<!-- # Contents
- [Overview](#overview)
- [Quick Start](#quick-start)
+ [Installation](#installation)
+ [Fast Training with Built-in Algorithms](#fast-training-with-built-in-algorithms)
- [On NVIDIA GPU](#on-nvidia-gpu)
- [On HUAWEI NPU](#on-huawei-npu)
+ [Three Steps to Create Your RL Agent](#three-steps-to-create-your-rl-agent)
+ [Algorithm Decoupling and Module Replacement](#algorithm-decoupling-and-module-replacement)
- [Function List (Part)](#function-list-part)
+ [RL Agents](#rl-agents)
+ [Intrinsic Reward Modules](#intrinsic-reward-modules)
- [RLLTE Ecosystem](#rllte-ecosystem)
- [API Documentation](#api-documentation)
- [Cite the Project](#cite-the-project)
- [How To Contribute](#how-to-contribute)
- [Acknowledgment](#acknowledgment)
- [Miscellaneous](#miscellaneous) -->
<!-- # Overview -->
Inspired by the long-term evolution (LTE) standard project in telecommunications, aiming to provide development components for and standards for advancing RL research and applications. Beyond delivering top-notch algorithm implementations, **RLLTE** also serves as a **toolkit** for developing algorithms.
<!-- <div align="center">
<a href="https://youtu.be/PMF6fa72bmE" rel="nofollow">
<img src='./docs/assets/images/youtube.png' style="width: 70%">
</a>
<br>
An introduction to RLLTE.
</div> -->
Why **RLLTE**?
- 𧬠Long-term evolution for providing latest algorithms and tricks;
- ποΈ Complete ecosystem for task design, model training, evaluation, and deployment (TensorRT, CANN, ...);
- 𧱠Module-oriented design for complete decoupling of RL algorithms;
- π Optimized workflow for full hardware acceleration;
- βοΈ Support custom environments and modules;
- π₯οΈ Support multiple computing devices like GPU and NPU;
- πΎ Large number of reusable benchmarks ([RLLTE Hub](https://hub.rllte.dev));
- π€ Large language model-empowered copilot ([RLLTE Copilot](https://github.com/RLE-Foundation/rllte-copilot)).
> β οΈ Since the construction of RLLTE Hub requires massive computing power, we have to upload the training datasets and model weights gradually. Progress report can be found in [Issue#30](https://github.com/RLE-Foundation/rllte/issues/30).
See the project structure below:
<div align=center>
<img src='./docs/assets/images/structure.svg' style="width: 100%">
</div>
For more detailed descriptions of these modules, see [API Documentation](https://docs.rllte.dev/api).
# Quick Start
## Installation
- with pip `recommended`
Open a terminal and install **rllte** with `pip`:
``` shell
conda create -n rllte python=3.8 # create an virtual environment
pip install rllte-core # basic installation
pip install rllte-core[envs] # for pre-defined environments
```
- with git
Open a terminal and clone the repository from [GitHub](https://github.com/RLE-Foundation/rllte) with `git`:
``` sh
git clone https://github.com/RLE-Foundation/rllte.git
pip install -e . # basic installation
pip install -e .[envs] # for pre-defined environments
```
For more detailed installation instruction, see [Getting Started](https://docs.rllte.dev/getting_started).
## Fast Training with Built-in Algorithms
**RLLTE** provides implementations for well-recognized RL algorithms and simple interface for building applications.
### On NVIDIA GPU
Suppose we want to use [DrQ-v2](https://openreview.net/forum?id=_SJ-_yyes8) to solve a task of [DeepMind Control Suite](https://github.com/deepmind/dm_control), and it suffices to write a `train.py` like:
``` python
# import `env` and `agent` module
from rllte.env import make_dmc_env
from rllte.agent import DrQv2
if __name__ == "__main__":
device = "cuda:0"
# create env, `eval_env` is optional
env = make_dmc_env(env_id="cartpole_balance", device=device)
eval_env = make_dmc_env(env_id="cartpole_balance", device=device)
# create agent
agent = DrQv2(env=env, eval_env=eval_env, device=device, tag="drqv2_dmc_pixel")
# start training
agent.train(num_train_steps=500000, log_interval=1000)
```
Run `train.py` and you will see the following output:
<div align=center>
<img src='./docs/assets/images/rl_training_gpu.gif' style="filter: drop-shadow(0px 0px 7px #000);">
</div>
### On HUAWEI NPU
Similarly, if we want to train an agent on HUAWEI NPU, it suffices to replace `cuda` with `npu`:
``` python
device = "cuda:0" -> device = "npu:0"
```
## Three Steps to Create Your RL Agent
Developers only need three steps to implement an RL algorithm with **RLLTE**. The following example illustrates how to write an Advantage Actor-Critic (A2C) agent to solve Atari games.
- Firstly, select a prototype:
<details>
<summary>Click to expand code</summary>
``` py
from rllte.common.prototype import OnPolicyAgent
```
</details>
- Secondly, select necessary modules to build the agent:
<details>
<summary>Click to expand code</summary>
``` py
from rllte.xploit.encoder import MnihCnnEncoder
from rllte.xploit.policy import OnPolicySharedActorCritic
from rllte.xploit.storage import VanillaRolloutStorage
from rllte.xplore.distribution import Categorical
```
- Run the `.describe` function of the selected policy and you will see the following output:
``` py
OnPolicySharedActorCritic.describe()
# Output:
# ================================================================================
# Name : OnPolicySharedActorCritic
# Structure : self.encoder (shared by actor and critic), self.actor, self.critic
# Forward : obs -> self.encoder -> self.actor -> actions
# : obs -> self.encoder -> self.critic -> values
# : actions -> log_probs
# Optimizers : self.optimizers['opt'] -> (self.encoder, self.actor, self.critic)
# ================================================================================
```
This illustrates the structure of the policy and indicate the optimizable parts.
</details>
- Thirdly, merge these modules and write an `.update` function:
<details>
<summary>Click to expand code</summary>
``` py
from torch import nn
import torch as th
class A2C(OnPolicyAgent):
def __init__(self, env, tag, seed, device, num_steps) -> None:
super().__init__(env=env, tag=tag, seed=seed, device=device, num_steps=num_steps)
# create modules
encoder = MnihCnnEncoder(observation_space=env.observation_space, feature_dim=512)
policy = OnPolicySharedActorCritic(observation_space=env.observation_space,
action_space=env.action_space,
feature_dim=512,
opt_class=th.optim.Adam,
opt_kwargs=dict(lr=2.5e-4, eps=1e-5),
init_fn="xavier_uniform"
)
storage = VanillaRolloutStorage(observation_space=env.observation_space,
action_space=env.action_space,
device=device,
storage_size=self.num_steps,
num_envs=self.num_envs,
batch_size=256
)
dist = Categorical()
# set all the modules
self.set(encoder=encoder, policy=policy, storage=storage, distribution=dist)
def update(self):
for _ in range(4):
for batch in self.storage.sample():
# evaluate the sampled actions
new_values, new_log_probs, entropy = self.policy.evaluate_actions(obs=batch.observations, actions=batch.actions)
# policy loss part
policy_loss = - (batch.adv_targ * new_log_probs).mean()
# value loss part
value_loss = 0.5 * (new_values.flatten() - batch.returns).pow(2).mean()
# update
self.policy.optimizers['opt'].zero_grad(set_to_none=True)
(value_loss * 0.5 + policy_loss - entropy * 0.01).backward()
nn.utils.clip_grad_norm_(self.policy.parameters(), 0.5)
self.policy.optimizers['opt'].step()
```
</details>
- Finally, train the agent by
<details>
<summary>Click to expand code</summary>
``` py
from rllte.env import make_atari_env
if __name__ == "__main__":
device = "cuda"
env = make_atari_env("PongNoFrameskip-v4", num_envs=8, seed=0, device=device)
agent = A2C(env=env, tag="a2c_atari", seed=0, device=device, num_steps=128)
agent.train(num_train_steps=10000000)
```
</details>
As shown in this example, only a few dozen lines of code are needed to create RL agents with **RLLTE**.
## Algorithm Decoupling and Module Replacement
**RLLTE** allows developers to replace settled modules of implemented algorithms to make performance comparison and algorithm improvement, and both
built-in and custom modules are supported. Suppose we want to compare the effect of different encoders, it suffices to invoke the `.set` function:
``` py
from rllte.xploit.encoder import EspeholtResidualEncoder
encoder = EspeholtResidualEncoder(...)
agent.set(encoder=encoder)
```
**RLLTE** is an extremely open framework that allows developers to try anything. For more detailed tutorials, see [Tutorials](https://docs.rllte.dev/tutorials).
# Function List (Part)
## RL Agents
| Type | Algo. | Box | Dis. | M.B. | M.D. | M.P. | NPU |π°|π|
|:-----------:|:------:|:---:|:----:|:----:|:----:|:------:|:---:|:------:|:---:|
| On-Policy | [A2C](https://arxiv.org/abs/1602.01783) | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |βοΈ |β |
| On-Policy | [PPO](https://arxiv.org/pdf/1707.06347) | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |βοΈ |β |
| On-Policy | [DrAC](https://proceedings.neurips.cc/paper/2021/file/2b38c2df6a49b97f706ec9148ce48d86-Paper.pdf)| βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |βοΈ | βοΈ |
| On-Policy | [DAAC](http://proceedings.mlr.press/v139/raileanu21a/raileanu21a.pdf)| βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |βοΈ | β |
| On-Policy | [DrDAAC](https://proceedings.neurips.cc/paper/2021/file/2b38c2df6a49b97f706ec9148ce48d86-Paper.pdf)| βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ |βοΈ | βοΈ |
| On-Policy | [PPG](http://proceedings.mlr.press/v139/cobbe21a/cobbe21a.pdf)| βοΈ | βοΈ | βοΈ | β | βοΈ | βοΈ |βοΈ | β |
| Off-Policy | [DQN](https://training.incf.org/sites/default/files/2023-05/Human-level%20control%20through%20deep%20reinforcement%20learning.pdf) | βοΈ | β | β | β | βοΈ | βοΈ |βοΈ | β |
| Off-Policy | [DDPG](https://arxiv.org/pdf/1509.02971.pdf?source=post_page---------------------------)| βοΈ | β | β | β | βοΈ | βοΈ |βοΈ |β |
| Off-Policy | [SAC](http://proceedings.mlr.press/v80/haarnoja18b/haarnoja18b.pdf)| βοΈ | β | β | β | βοΈ | βοΈ |βοΈ |β |
| Off-Policy | [SAC-Discrete](https://arxiv.org/abs/1910.07207)| β | βοΈ | β | β | βοΈ | βοΈ |βοΈ |β |
| Off-Policy | [TD3](http://proceedings.mlr.press/v80/fujimoto18a/fujimoto18a.pdf)| βοΈ | β | β | β | βοΈ | βοΈ |βοΈ |β |
| Off-Policy | [DrQ-v2](https://arxiv.org/pdf/2107.09645.pdf?utm_source=morioh.com)| βοΈ | β | β | β | β | βοΈ |βοΈ |βοΈ |
| Distributed | [IMPALA](http://proceedings.mlr.press/v80/espeholt18a/espeholt18a.pdf) | βοΈ | βοΈ | β | β | βοΈ | β |β |β |
> - `Dis., M.B., M.D.`: `Discrete`, `MultiBinary`, and `MultiDiscrete` action space;
> - `M.P.`: Multi processing;
> - π: Developing;
> - π°: Support intrinsic reward shaping;
> - π: Support observation augmentation.
## Intrinsic Reward Modules
| **Type** | **Modules** |
|--- |--- |
| Count-based | [PseudoCounts](https://arxiv.org/pdf/2002.06038), [RND](https://arxiv.org/pdf/1810.12894.pdf), [E3B](https://proceedings.neurips.cc/paper_files/paper/2022/file/f4f79698d48bdc1a6dec20583724182b-Paper-Conference.pdf) |
| Curiosity-driven | [ICM](http://proceedings.mlr.press/v70/pathak17a/pathak17a.pdf), [GIRM](http://proceedings.mlr.press/v119/yu20d/yu20d.pdf), [RIDE](https://arxiv.org/pdf/2002.12292), [Disagreement](https://arxiv.org/pdf/1906.04161.pdf) |
| Memory-based | [NGU](https://arxiv.org/pdf/2002.06038) |
| Information theory-based | [RE3](http://proceedings.mlr.press/v139/seo21a/seo21a.pdf), [RISE](https://ieeexplore.ieee.org/abstract/document/9802917/), [REVD](https://openreview.net/pdf?id=V2pw1VYMrDo) |
See [Tutorials: Use Intrinsic Reward and Observation Augmentation](https://docs.rllte.dev/tutorials/data_augmentation) for usage examples.
# RLLTE Ecosystem
Explore the ecosystem of RLLTE to facilitate your project:
- [Hub](https://docs.rllte.dev/benchmarks/): Fast training APIs and reusable benchmarks.
- [Evaluation](https://docs.rllte.dev/api/tutorials/): Reasonable and reliable metrics for algorithm evaluation.
- [Env](https://docs.rllte.dev/api/tutorials/): Packaged environments for fast invocation.
- [Deployment](https://docs.rllte.dev/api/tutorials/): Convenient APIs for model deployment.
- [Pre-training](https://docs.rllte.dev/api/tutorials/): Methods of pre-training in RL.
- [Copilot](https://docs.rllte.dev/copilot): Large language model-empowered copilot.
<!-- # API Documentation
View our well-designed documentation: [https://docs.rllte.dev/](https://docs.rllte.dev/)
<div align=center>
<img src='./docs/assets/images/docs.gif' style="width: 100%">
</div> -->
# How To Contribute
Welcome to contribute to this project! Before you begin writing code, please read [CONTRIBUTING.md](https://github.com/RLE-Foundation/rllte/blob/main/CONTRIBUTING.md) for guide first.
# Cite the Project
To cite this project in publications:
```bibtex
@article{yuan2023rllte,
title={RLLTE: Long-Term Evolution Project of Reinforcement Learning},
author={Mingqi Yuan and Zequn Zhang and Yang Xu and Shihao Luo and Bo Li and Xin Jin and Wenjun Zeng},
year={2023},
journal={arXiv preprint arXiv:2309.16382}
}
```
# Acknowledgment
This project is supported by [The Hong Kong Polytechnic University](http://www.polyu.edu.hk/), [Eastern Institute for Advanced Study](http://www.eias.ac.cn/), and [FLW-Foundation](FLW-Foundation). [EIAS HPC](https://hpc.eias.ac.cn/) provides a GPU computing platform, and [HUAWEI Ascend Community](https://www.hiascend.com/) provides an NPU computing platform for our testing. Some code of this project is borrowed or inspired by several excellent projects, and we highly appreciate them. See [ACKNOWLEDGMENT.md](https://github.com/RLE-Foundation/rllte/blob/main/ACKNOWLEDGMENT.md).
<!-- # Miscellaneous
## ↳ Stargazers, thanks for your support!
[![Stargazers repo roster for @RLE-Foundation/rllte](https://reporoster.com/stars/RLE-Foundation/rllte)](https://github.com/RLE-Foundation/rllte/stargazers)
## ↳ Forkers, thanks for your support!
[![Forkers repo roster for @RLE-Foundation/rllte](https://reporoster.com/forks/RLE-Foundation/rllte)](https://github.com/RLE-Foundation/rllte/network/members)
## ↳ Star History
<div align="center">
[![Star History Chart](https://api.star-history.com/svg?repos=RLE-Foundation/rllte&type=Date)](https://star-history.com/#RLE-Foundation/rllte&Date)
</div> -->
Raw data
{
"_id": null,
"home_page": null,
"name": "rllte-core",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "Algorithm, Baseline, Evolution, Reinforcement Learning",
"author": null,
"author_email": "Reinforcement Learning Evolution Foundation <friedrichyuan19990827@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/b8/88/f719279305e8ac6d7aaf085eb15bff3e3377ef0dbc6b3b7a0d4dbf453d93/rllte_core-1.0.1.tar.gz",
"platform": null,
"description": "<div align=center>\n<br>\n<img src='./docs/assets/images/logo_horizontal.svg' style=\"width: 55%\">\n<br>\n\n## RLLTE: Long-Term Evolution Project of Reinforcement Learning\n\n<!-- <h3> <a href=\"https://arxiv.org/pdf/2309.16382.pdf\"> Paper </a> |\n<a href=\"https://docs.rllte.dev/api/\"> Documentation </a> |\n<a href=\"https://docs.rllte.dev/tutorials/\"> Tutorials </a> |\n<a href=\"https://github.com/RLE-Foundation/rllte/discussions\"> Forum </a> |\n<a href=\"https://hub.rllte.dev/\"> Benchmarks </a></h3> -->\n\n<img src=\"https://img.shields.io/badge/License-MIT-%230677b8\"> <img src=\"https://img.shields.io/badge/GPU-NVIDIA-%2377b900\"> <img src=\"https://img.shields.io/badge/NPU-Ascend-%23c31d20\"> <img src=\"https://img.shields.io/badge/Python-%3E%3D3.8-%2335709F\"> <img src=\"https://img.shields.io/badge/Docs-Passing-%23009485\"> <img src=\"https://img.shields.io/badge/Codestyle-Black-black\"> <img src=\"https://img.shields.io/badge/PyPI-0.0.1-%23006DAD\"> \n\n<!-- <img src=\"https://img.shields.io/badge/Coverage-97.00%25-green\"> -->\n\n<!-- | [English](README.md) | [\u4e2d\u6587](docs/README-zh-Hans.md) | -->\n\n</div>\n\n<!-- # Contents\n- [Overview](#overview)\n- [Quick Start](#quick-start)\n + [Installation](#installation)\n + [Fast Training with Built-in Algorithms](#fast-training-with-built-in-algorithms)\n - [On NVIDIA GPU](#on-nvidia-gpu)\n - [On HUAWEI NPU](#on-huawei-npu)\n + [Three Steps to Create Your RL Agent](#three-steps-to-create-your-rl-agent)\n + [Algorithm Decoupling and Module Replacement](#algorithm-decoupling-and-module-replacement)\n- [Function List (Part)](#function-list-part)\n + [RL Agents](#rl-agents)\n + [Intrinsic Reward Modules](#intrinsic-reward-modules)\n- [RLLTE Ecosystem](#rllte-ecosystem)\n- [API Documentation](#api-documentation)\n- [Cite the Project](#cite-the-project)\n- [How To Contribute](#how-to-contribute)\n- [Acknowledgment](#acknowledgment)\n- [Miscellaneous](#miscellaneous) -->\n\n<!-- # Overview -->\nInspired by the long-term evolution (LTE) standard project in telecommunications, aiming to provide development components for and standards for advancing RL research and applications. Beyond delivering top-notch algorithm implementations, **RLLTE** also serves as a **toolkit** for developing algorithms.\n\n<!-- <div align=\"center\">\n<a href=\"https://youtu.be/PMF6fa72bmE\" rel=\"nofollow\">\n<img src='./docs/assets/images/youtube.png' style=\"width: 70%\">\n</a>\n<br>\nAn introduction to RLLTE.\n</div> -->\n\nWhy **RLLTE**?\n- \ud83e\uddec Long-term evolution for providing latest algorithms and tricks;\n- \ud83c\udfde\ufe0f Complete ecosystem for task design, model training, evaluation, and deployment (TensorRT, CANN, ...);\n- \ud83e\uddf1 Module-oriented design for complete decoupling of RL algorithms;\n- \ud83d\ude80 Optimized workflow for full hardware acceleration;\n- \u2699\ufe0f Support custom environments and modules;\n- \ud83d\udda5\ufe0f Support multiple computing devices like GPU and NPU;\n- \ud83d\udcbe Large number of reusable benchmarks ([RLLTE Hub](https://hub.rllte.dev));\n- \ud83e\udd16 Large language model-empowered copilot ([RLLTE Copilot](https://github.com/RLE-Foundation/rllte-copilot)).\n\n> \u26a0\ufe0f Since the construction of RLLTE Hub requires massive computing power, we have to upload the training datasets and model weights gradually. Progress report can be found in [Issue#30](https://github.com/RLE-Foundation/rllte/issues/30).\n\nSee the project structure below:\n<div align=center>\n<img src='./docs/assets/images/structure.svg' style=\"width: 100%\">\n</div>\n\nFor more detailed descriptions of these modules, see [API Documentation](https://docs.rllte.dev/api).\n\n# Quick Start\n## Installation\n- with pip `recommended`\n\nOpen a terminal and install **rllte** with `pip`:\n``` shell\nconda create -n rllte python=3.8 # create an virtual environment\npip install rllte-core # basic installation\npip install rllte-core[envs] # for pre-defined environments\n```\n\n- with git\n\nOpen a terminal and clone the repository from [GitHub](https://github.com/RLE-Foundation/rllte) with `git`:\n``` sh\ngit clone https://github.com/RLE-Foundation/rllte.git\npip install -e . # basic installation\npip install -e .[envs] # for pre-defined environments\n```\n\nFor more detailed installation instruction, see [Getting Started](https://docs.rllte.dev/getting_started).\n\n## Fast Training with Built-in Algorithms\n**RLLTE** provides implementations for well-recognized RL algorithms and simple interface for building applications.\n### On NVIDIA GPU\nSuppose we want to use [DrQ-v2](https://openreview.net/forum?id=_SJ-_yyes8) to solve a task of [DeepMind Control Suite](https://github.com/deepmind/dm_control), and it suffices to write a `train.py` like:\n\n``` python\n# import `env` and `agent` module\nfrom rllte.env import make_dmc_env \nfrom rllte.agent import DrQv2\n\nif __name__ == \"__main__\":\n device = \"cuda:0\"\n # create env, `eval_env` is optional\n env = make_dmc_env(env_id=\"cartpole_balance\", device=device)\n eval_env = make_dmc_env(env_id=\"cartpole_balance\", device=device)\n # create agent\n agent = DrQv2(env=env, eval_env=eval_env, device=device, tag=\"drqv2_dmc_pixel\")\n # start training\n agent.train(num_train_steps=500000, log_interval=1000)\n```\nRun `train.py` and you will see the following output:\n\n<div align=center>\n<img src='./docs/assets/images/rl_training_gpu.gif' style=\"filter: drop-shadow(0px 0px 7px #000);\">\n</div>\n\n### On HUAWEI NPU\nSimilarly, if we want to train an agent on HUAWEI NPU, it suffices to replace `cuda` with `npu`:\n``` python\ndevice = \"cuda:0\" -> device = \"npu:0\"\n```\n\n## Three Steps to Create Your RL Agent\n\n\nDevelopers only need three steps to implement an RL algorithm with **RLLTE**. The following example illustrates how to write an Advantage Actor-Critic (A2C) agent to solve Atari games. \n- Firstly, select a prototype:\n <details>\n <summary>Click to expand code</summary>\n ``` py\n from rllte.common.prototype import OnPolicyAgent\n ```\n </details>\n- Secondly, select necessary modules to build the agent:\n\n <details>\n <summary>Click to expand code</summary>\n\n ``` py\n from rllte.xploit.encoder import MnihCnnEncoder\n from rllte.xploit.policy import OnPolicySharedActorCritic\n from rllte.xploit.storage import VanillaRolloutStorage\n from rllte.xplore.distribution import Categorical\n ```\n - Run the `.describe` function of the selected policy and you will see the following output:\n ``` py\n OnPolicySharedActorCritic.describe()\n # Output:\n # ================================================================================\n # Name : OnPolicySharedActorCritic\n # Structure : self.encoder (shared by actor and critic), self.actor, self.critic\n # Forward : obs -> self.encoder -> self.actor -> actions\n # : obs -> self.encoder -> self.critic -> values\n # : actions -> log_probs\n # Optimizers : self.optimizers['opt'] -> (self.encoder, self.actor, self.critic)\n # ================================================================================\n ```\n This illustrates the structure of the policy and indicate the optimizable parts.\n\n </details>\n\n- Thirdly, merge these modules and write an `.update` function:\n\n <details>\n <summary>Click to expand code</summary>\n\n ``` py\n from torch import nn\n import torch as th\n\n class A2C(OnPolicyAgent):\n def __init__(self, env, tag, seed, device, num_steps) -> None:\n super().__init__(env=env, tag=tag, seed=seed, device=device, num_steps=num_steps)\n # create modules\n encoder = MnihCnnEncoder(observation_space=env.observation_space, feature_dim=512)\n policy = OnPolicySharedActorCritic(observation_space=env.observation_space,\n action_space=env.action_space,\n feature_dim=512,\n opt_class=th.optim.Adam,\n opt_kwargs=dict(lr=2.5e-4, eps=1e-5),\n init_fn=\"xavier_uniform\"\n )\n storage = VanillaRolloutStorage(observation_space=env.observation_space,\n action_space=env.action_space,\n device=device,\n storage_size=self.num_steps,\n num_envs=self.num_envs,\n batch_size=256\n )\n dist = Categorical()\n # set all the modules\n self.set(encoder=encoder, policy=policy, storage=storage, distribution=dist)\n \n def update(self):\n for _ in range(4):\n for batch in self.storage.sample():\n # evaluate the sampled actions\n new_values, new_log_probs, entropy = self.policy.evaluate_actions(obs=batch.observations, actions=batch.actions)\n # policy loss part\n policy_loss = - (batch.adv_targ * new_log_probs).mean()\n # value loss part\n value_loss = 0.5 * (new_values.flatten() - batch.returns).pow(2).mean()\n # update\n self.policy.optimizers['opt'].zero_grad(set_to_none=True)\n (value_loss * 0.5 + policy_loss - entropy * 0.01).backward()\n nn.utils.clip_grad_norm_(self.policy.parameters(), 0.5)\n self.policy.optimizers['opt'].step()\n ```\n\n </details>\n\n- Finally, train the agent by\n <details>\n <summary>Click to expand code</summary>\n ``` py\n from rllte.env import make_atari_env\n if __name__ == \"__main__\":\n device = \"cuda\"\n env = make_atari_env(\"PongNoFrameskip-v4\", num_envs=8, seed=0, device=device)\n agent = A2C(env=env, tag=\"a2c_atari\", seed=0, device=device, num_steps=128)\n agent.train(num_train_steps=10000000)\n ```\n </details>\n\nAs shown in this example, only a few dozen lines of code are needed to create RL agents with **RLLTE**. \n\n## Algorithm Decoupling and Module Replacement\n**RLLTE** allows developers to replace settled modules of implemented algorithms to make performance comparison and algorithm improvement, and both \nbuilt-in and custom modules are supported. Suppose we want to compare the effect of different encoders, it suffices to invoke the `.set` function:\n``` py\nfrom rllte.xploit.encoder import EspeholtResidualEncoder\nencoder = EspeholtResidualEncoder(...)\nagent.set(encoder=encoder)\n```\n**RLLTE** is an extremely open framework that allows developers to try anything. For more detailed tutorials, see [Tutorials](https://docs.rllte.dev/tutorials).\n\n# Function List (Part)\n## RL Agents\n| Type | Algo. | Box | Dis. | M.B. | M.D. | M.P. | NPU |\ud83d\udcb0|\ud83d\udd2d|\n|:-----------:|:------:|:---:|:----:|:----:|:----:|:------:|:---:|:------:|:---:|\n| On-Policy | [A2C](https://arxiv.org/abs/1602.01783) | \u2714\ufe0f | \u2714\ufe0f | \u2714\ufe0f | \u2714\ufe0f | \u2714\ufe0f | \u2714\ufe0f |\u2714\ufe0f |\u274c |\n| On-Policy | [PPO](https://arxiv.org/pdf/1707.06347) | \u2714\ufe0f | \u2714\ufe0f | \u2714\ufe0f | \u2714\ufe0f | \u2714\ufe0f | \u2714\ufe0f |\u2714\ufe0f |\u274c |\n| On-Policy | [DrAC](https://proceedings.neurips.cc/paper/2021/file/2b38c2df6a49b97f706ec9148ce48d86-Paper.pdf)| \u2714\ufe0f | \u2714\ufe0f | \u2714\ufe0f | \u2714\ufe0f | \u2714\ufe0f | \u2714\ufe0f |\u2714\ufe0f | \u2714\ufe0f |\n| On-Policy | [DAAC](http://proceedings.mlr.press/v139/raileanu21a/raileanu21a.pdf)| \u2714\ufe0f | \u2714\ufe0f | \u2714\ufe0f | \u2714\ufe0f | \u2714\ufe0f | \u2714\ufe0f |\u2714\ufe0f | \u274c |\n| On-Policy | [DrDAAC](https://proceedings.neurips.cc/paper/2021/file/2b38c2df6a49b97f706ec9148ce48d86-Paper.pdf)| \u2714\ufe0f | \u2714\ufe0f | \u2714\ufe0f | \u2714\ufe0f | \u2714\ufe0f | \u2714\ufe0f |\u2714\ufe0f | \u2714\ufe0f |\n| On-Policy | [PPG](http://proceedings.mlr.press/v139/cobbe21a/cobbe21a.pdf)| \u2714\ufe0f | \u2714\ufe0f | \u2714\ufe0f | \u274c | \u2714\ufe0f | \u2714\ufe0f |\u2714\ufe0f | \u274c |\n| Off-Policy | [DQN](https://training.incf.org/sites/default/files/2023-05/Human-level%20control%20through%20deep%20reinforcement%20learning.pdf) | \u2714\ufe0f | \u274c | \u274c | \u274c | \u2714\ufe0f | \u2714\ufe0f |\u2714\ufe0f | \u274c |\n| Off-Policy | [DDPG](https://arxiv.org/pdf/1509.02971.pdf?source=post_page---------------------------)| \u2714\ufe0f | \u274c | \u274c | \u274c | \u2714\ufe0f | \u2714\ufe0f |\u2714\ufe0f |\u274c |\n| Off-Policy | [SAC](http://proceedings.mlr.press/v80/haarnoja18b/haarnoja18b.pdf)| \u2714\ufe0f | \u274c | \u274c | \u274c | \u2714\ufe0f | \u2714\ufe0f |\u2714\ufe0f |\u274c |\n| Off-Policy | [SAC-Discrete](https://arxiv.org/abs/1910.07207)| \u274c | \u2714\ufe0f | \u274c | \u274c | \u2714\ufe0f | \u2714\ufe0f |\u2714\ufe0f |\u274c |\n| Off-Policy | [TD3](http://proceedings.mlr.press/v80/fujimoto18a/fujimoto18a.pdf)| \u2714\ufe0f | \u274c | \u274c | \u274c | \u2714\ufe0f | \u2714\ufe0f |\u2714\ufe0f |\u274c |\n| Off-Policy | [DrQ-v2](https://arxiv.org/pdf/2107.09645.pdf?utm_source=morioh.com)| \u2714\ufe0f | \u274c | \u274c | \u274c | \u274c | \u2714\ufe0f |\u2714\ufe0f |\u2714\ufe0f |\n| Distributed | [IMPALA](http://proceedings.mlr.press/v80/espeholt18a/espeholt18a.pdf) | \u2714\ufe0f | \u2714\ufe0f | \u274c | \u274c | \u2714\ufe0f | \u274c |\u274c |\u274c |\n\n> - `Dis., M.B., M.D.`: `Discrete`, `MultiBinary`, and `MultiDiscrete` action space;\n> - `M.P.`: Multi processing;\n> - \ud83d\udc0c: Developing;\n> - \ud83d\udcb0: Support intrinsic reward shaping;\n> - \ud83d\udd2d: Support observation augmentation. \n\n\n## Intrinsic Reward Modules\n| **Type** \t| **Modules** \t|\n|---\t|---\t|\n| Count-based \t| [PseudoCounts](https://arxiv.org/pdf/2002.06038), [RND](https://arxiv.org/pdf/1810.12894.pdf), [E3B](https://proceedings.neurips.cc/paper_files/paper/2022/file/f4f79698d48bdc1a6dec20583724182b-Paper-Conference.pdf) \t|\n| Curiosity-driven \t| [ICM](http://proceedings.mlr.press/v70/pathak17a/pathak17a.pdf), [GIRM](http://proceedings.mlr.press/v119/yu20d/yu20d.pdf), [RIDE](https://arxiv.org/pdf/2002.12292), [Disagreement](https://arxiv.org/pdf/1906.04161.pdf) \t|\n| Memory-based \t| [NGU](https://arxiv.org/pdf/2002.06038) \t|\n| Information theory-based \t| [RE3](http://proceedings.mlr.press/v139/seo21a/seo21a.pdf), [RISE](https://ieeexplore.ieee.org/abstract/document/9802917/), [REVD](https://openreview.net/pdf?id=V2pw1VYMrDo) \t|\n\nSee [Tutorials: Use Intrinsic Reward and Observation Augmentation](https://docs.rllte.dev/tutorials/data_augmentation) for usage examples.\n\n# RLLTE Ecosystem\nExplore the ecosystem of RLLTE to facilitate your project:\n\n- [Hub](https://docs.rllte.dev/benchmarks/): Fast training APIs and reusable benchmarks.\n- [Evaluation](https://docs.rllte.dev/api/tutorials/): Reasonable and reliable metrics for algorithm evaluation.\n- [Env](https://docs.rllte.dev/api/tutorials/): Packaged environments for fast invocation.\n- [Deployment](https://docs.rllte.dev/api/tutorials/): Convenient APIs for model deployment.\n- [Pre-training](https://docs.rllte.dev/api/tutorials/): Methods of pre-training in RL.\n- [Copilot](https://docs.rllte.dev/copilot): Large language model-empowered copilot.\n\n<!-- # API Documentation\nView our well-designed documentation: [https://docs.rllte.dev/](https://docs.rllte.dev/)\n<div align=center>\n<img src='./docs/assets/images/docs.gif' style=\"width: 100%\">\n</div> -->\n\n# How To Contribute\nWelcome to contribute to this project! Before you begin writing code, please read [CONTRIBUTING.md](https://github.com/RLE-Foundation/rllte/blob/main/CONTRIBUTING.md) for guide first.\n\n# Cite the Project\nTo cite this project in publications:\n```bibtex\n@article{yuan2023rllte,\n title={RLLTE: Long-Term Evolution Project of Reinforcement Learning}, \n author={Mingqi Yuan and Zequn Zhang and Yang Xu and Shihao Luo and Bo Li and Xin Jin and Wenjun Zeng},\n year={2023},\n journal={arXiv preprint arXiv:2309.16382}\n}\n```\n\n# Acknowledgment\nThis project is supported by [The Hong Kong Polytechnic University](http://www.polyu.edu.hk/), [Eastern Institute for Advanced Study](http://www.eias.ac.cn/), and [FLW-Foundation](FLW-Foundation). [EIAS HPC](https://hpc.eias.ac.cn/) provides a GPU computing platform, and [HUAWEI Ascend Community](https://www.hiascend.com/) provides an NPU computing platform for our testing. Some code of this project is borrowed or inspired by several excellent projects, and we highly appreciate them. See [ACKNOWLEDGMENT.md](https://github.com/RLE-Foundation/rllte/blob/main/ACKNOWLEDGMENT.md).\n\n<!-- # Miscellaneous\n\n## ↳ Stargazers, thanks for your support!\n[![Stargazers repo roster for @RLE-Foundation/rllte](https://reporoster.com/stars/RLE-Foundation/rllte)](https://github.com/RLE-Foundation/rllte/stargazers)\n\n## ↳ Forkers, thanks for your support!\n[![Forkers repo roster for @RLE-Foundation/rllte](https://reporoster.com/forks/RLE-Foundation/rllte)](https://github.com/RLE-Foundation/rllte/network/members)\n\n## ↳ Star History\n<div align=\"center\">\n\n[![Star History Chart](https://api.star-history.com/svg?repos=RLE-Foundation/rllte&type=Date)](https://star-history.com/#RLE-Foundation/rllte&Date)\n\n</div> -->\n",
"bugtrack_url": null,
"license": null,
"summary": "Long-Term Evolution Project of Reinforcement Learning",
"version": "1.0.1",
"project_urls": {
"Benchmark": "https://hub.rllte.dev/",
"Bug Tracker": "https://github.com/RLE-Foundation/rllte/issues",
"Code": "https://github.com/RLE-Foundation/rllte",
"Documentation": "https://docs.rllte.dev/"
},
"split_keywords": [
"algorithm",
" baseline",
" evolution",
" reinforcement learning"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2d2b77b140abd443a0d222e4e7466949e198e87a161a1f0ff286d998dd2ad000",
"md5": "00437415ecd0f016e122dfcd19cd52d7",
"sha256": "3a0ec99c27eae05def49fee79cf868bca298fbf2f7db9c3b90f9be3fd467d6b3"
},
"downloads": -1,
"filename": "rllte_core-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "00437415ecd0f016e122dfcd19cd52d7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 727513,
"upload_time": "2024-05-25T16:44:50",
"upload_time_iso_8601": "2024-05-25T16:44:50.710294Z",
"url": "https://files.pythonhosted.org/packages/2d/2b/77b140abd443a0d222e4e7466949e198e87a161a1f0ff286d998dd2ad000/rllte_core-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b888f719279305e8ac6d7aaf085eb15bff3e3377ef0dbc6b3b7a0d4dbf453d93",
"md5": "4ac56433a71b35e871507d59b9449bd2",
"sha256": "e5fc270f7e47d09f301ee154cae735940b4084a054727691eb172ae8e7aa4981"
},
"downloads": -1,
"filename": "rllte_core-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "4ac56433a71b35e871507d59b9449bd2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 13878229,
"upload_time": "2024-05-25T16:44:55",
"upload_time_iso_8601": "2024-05-25T16:44:55.276694Z",
"url": "https://files.pythonhosted.org/packages/b8/88/f719279305e8ac6d7aaf085eb15bff3e3377ef0dbc6b3b7a0d4dbf453d93/rllte_core-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-25 16:44:55",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "RLE-Foundation",
"github_project": "rllte",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "rllte-core"
}