rllte-core


Namerllte-core JSON
Version 0.0.1b13 PyPI version JSON
download
home_page
SummaryLong-Term Evolution Project of Reinforcement Learning
upload_time2024-02-29 22:52:42
maintainer
docs_urlNone
author
requires_python>=3.8
license
keywords algorithm baseline evolution reinforcement learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align=center>
<br>
<img src='./docs/assets/images/logo_horizontal.svg' style="width: 75%">
<br>
RLLTE: Long-Term Evolution Project of Reinforcement Learning

<h3> <a href="https://arxiv.org/pdf/2309.16382.pdf"> Paper </a> |
<a href="https://docs.rllte.dev/api/"> Documentation </a> |
<a href="https://docs.rllte.dev/tutorials/"> Tutorials </a> |
<a href="https://github.com/RLE-Foundation/rllte/discussions"> Forum </a> |
<a href="https://hub.rllte.dev/"> Benchmarks </a></h3>

<img src="https://img.shields.io/badge/License-MIT-%230677b8"> <img src="https://img.shields.io/badge/GPU-NVIDIA-%2377b900"> <img src="https://img.shields.io/badge/NPU-Ascend-%23c31d20"> <img src="https://img.shields.io/badge/Python-%3E%3D3.8-%2335709F"> <img src="https://img.shields.io/badge/Docs-Passing-%23009485"> <img src="https://img.shields.io/badge/Codestyle-Black-black"> <img src="https://img.shields.io/badge/PyPI-0.0.1-%23006DAD"> <img src="https://img.shields.io/badge/Coverage-97.00%25-green"> 

| [English](README.md) | [δΈ­ζ–‡](docs/README-zh-Hans.md) |

</div>

# Contents
- [Overview](#overview)
- [Quick Start](#quick-start)
  + [Installation](#installation)
  + [Fast Training with Built-in Algorithms](#fast-training-with-built-in-algorithms)
    - [On NVIDIA GPU](#on-nvidia-gpu)
    - [On HUAWEI NPU](#on-huawei-npu)
  + [Three Steps to Create Your RL Agent](#three-steps-to-create-your-rl-agent)
  + [Algorithm Decoupling and Module Replacement](#algorithm-decoupling-and-module-replacement)
- [Function List (Part)](#function-list-part)
  + [RL Agents](#rl-agents)
  + [Intrinsic Reward Modules](#intrinsic-reward-modules)
- [RLLTE Ecosystem](#rllte-ecosystem)
- [API Documentation](#api-documentation)
- [Cite the Project](#cite-the-project)
- [How To Contribute](#how-to-contribute)
- [Acknowledgment](#acknowledgment)
- [Miscellaneous](#miscellaneous)

# Overview
Inspired by the long-term evolution (LTE) standard project in telecommunications, aiming to provide development components for and standards for advancing RL research and applications. Beyond delivering top-notch algorithm implementations, **RLLTE** also serves as a **toolkit** for developing algorithms.

<div align="center">
<a href="https://youtu.be/PMF6fa72bmE" rel="nofollow">
<img src='./docs/assets/images/youtube.png' style="width: 70%">
</a>
<br>
An introduction to RLLTE.
</div>

Why **RLLTE**?
- 🧬 Long-term evolution for providing latest algorithms and tricks;
- 🏞️ Complete ecosystem for task design, model training, evaluation, and deployment (TensorRT, CANN, ...);
- 🧱 Module-oriented design for complete decoupling of RL algorithms;
- πŸš€ Optimized workflow for full hardware acceleration;
- βš™οΈ Support custom environments and modules;
- πŸ–₯️ Support multiple computing devices like GPU and NPU;
- πŸ’Ύ Large number of reusable benchmarks ([RLLTE Hub](https://hub.rllte.dev));
- πŸ‘¨β€βœˆοΈ Large language model-empowered copilot ([RLLTE Copilot](https://github.com/RLE-Foundation/rllte-copilot)).

> ⚠️ Since the construction of RLLTE Hub requires massive computing power, we have to upload the training datasets and model weights gradually. Progress report can be found in [Issue#30](https://github.com/RLE-Foundation/rllte/issues/30).

See the project structure below:
<div align=center>
<img src='./docs/assets/images/structure.svg' style="width: 100%">
</div>

For more detailed descriptions of these modules, see [API Documentation](https://docs.rllte.dev/api).

# Quick Start
## Installation
- Prerequisites

Currently, we recommend `Python>=3.8`, and user can create an virtual environment by
``` sh
conda create -n rllte python=3.8
```

- with pip `recommended`

Open a terminal and install **rllte** with `pip`:
``` shell
pip install rllte-core # basic installation
pip install rllte-core[envs] # for pre-defined environments
```

- with git

Open a terminal and clone the repository from [GitHub](https://github.com/RLE-Foundation/rllte) with `git`:
``` sh
git clone https://github.com/RLE-Foundation/rllte.git
```
After that, run the following command to install package and dependencies:
``` sh
pip install -e . # basic installation
pip install -e .[envs] # for pre-defined environments
```

For more detailed installation instruction, see [Getting Started](https://docs.rllte.dev/getting_started).

## Fast Training with Built-in Algorithms
**RLLTE** provides implementations for well-recognized RL algorithms and simple interface for building applications.
### On NVIDIA GPU
Suppose we want to use [DrQ-v2](https://openreview.net/forum?id=_SJ-_yyes8) to solve a task of [DeepMind Control Suite](https://github.com/deepmind/dm_control), and it suffices to write a `train.py` like:

``` python
# import `env` and `agent` module
from rllte.env import make_dmc_env 
from rllte.agent import DrQv2

if __name__ == "__main__":
    device = "cuda:0"
    # create env, `eval_env` is optional
    env = make_dmc_env(env_id="cartpole_balance", device=device)
    eval_env = make_dmc_env(env_id="cartpole_balance", device=device)
    # create agent
    agent = DrQv2(env=env, eval_env=eval_env, device=device, tag="drqv2_dmc_pixel")
    # start training
    agent.train(num_train_steps=500000, log_interval=1000)
```
Run `train.py` and you will see the following output:

<div align=center>
<img src='./docs/assets/images/rl_training_gpu.gif' style="filter: drop-shadow(0px 0px 7px #000);">
</div>

### On HUAWEI NPU
Similarly, if we want to train an agent on HUAWEI NPU, it suffices to replace `cuda` with `npu`:
``` python
device = "cuda:0" -> device = "npu:0"
```

## Three Steps to Create Your RL Agent
Developers only need three steps to implement an RL algorithm with **RLLTE**. The following example illustrates how to write an Advantage Actor-Critic (A2C) agent to solve Atari games. 
- Firstly, select a prototype:
``` py
from rllte.common.prototype import OnPolicyAgent
```
- Secondly, select necessary modules to build the agent:
``` py
from rllte.xploit.encoder import MnihCnnEncoder
from rllte.xploit.policy import OnPolicySharedActorCritic
from rllte.xploit.storage import VanillaRolloutStorage
from rllte.xplore.distribution import Categorical
```
- Run the `.describe` function of the selected policy and you will see the following output:
``` py
OnPolicySharedActorCritic.describe()
# Output:
# ================================================================================
# Name       : OnPolicySharedActorCritic
# Structure  : self.encoder (shared by actor and critic), self.actor, self.critic
# Forward    : obs -> self.encoder -> self.actor -> actions
#            : obs -> self.encoder -> self.critic -> values
#            : actions -> log_probs
# Optimizers : self.optimizers['opt'] -> (self.encoder, self.actor, self.critic)
# ================================================================================
```
This will illustrate the structure of the policy and indicate the optimizable parts. Finally, merge these modules and write an `.update` function:
``` py
from torch import nn
import torch as th

class A2C(OnPolicyAgent):
    def __init__(self, env, tag, seed, device, num_steps) -> None:
        super().__init__(env=env, tag=tag, seed=seed, device=device, num_steps=num_steps)
        # create modules
        encoder = MnihCnnEncoder(observation_space=env.observation_space, feature_dim=512)
        policy = OnPolicySharedActorCritic(observation_space=env.observation_space,
                                           action_space=env.action_space,
                                           feature_dim=512,
                                           opt_class=th.optim.Adam,
                                           opt_kwargs=dict(lr=2.5e-4, eps=1e-5),
                                           init_fn="xavier_uniform"
                                           )
        storage = VanillaRolloutStorage(observation_space=env.observation_space,
                                        action_space=env.action_space,
                                        device=device,
                                        storage_size=self.num_steps,
                                        num_envs=self.num_envs,
                                        batch_size=256
                                        )
        dist = Categorical()
        # set all the modules
        self.set(encoder=encoder, policy=policy, storage=storage, distribution=dist)
    
    def update(self):
        for _ in range(4):
            for batch in self.storage.sample():
                # evaluate the sampled actions
                new_values, new_log_probs, entropy = self.policy.evaluate_actions(obs=batch.observations, actions=batch.actions)
                # policy loss part
                policy_loss = - (batch.adv_targ * new_log_probs).mean()
                # value loss part
                value_loss = 0.5 * (new_values.flatten() - batch.returns).pow(2).mean()
                # update
                self.policy.optimizers['opt'].zero_grad(set_to_none=True)
                (value_loss * 0.5 + policy_loss - entropy * 0.01).backward()
                nn.utils.clip_grad_norm_(self.policy.parameters(), 0.5)
                self.policy.optimizers['opt'].step()
```
Then train the agent by
``` py
from rllte.env import make_atari_env
if __name__ == "__main__":
    device = "cuda"
    env = make_atari_env("PongNoFrameskip-v4", num_envs=8, seed=0, device=device)
    agent = A2C(env=env, tag="a2c_atari", seed=0, device=device, num_steps=128)
    agent.train(num_train_steps=10000000)
```
As shown in this example, only a few dozen lines of code are needed to create RL agents with **RLLTE**. 

## Algorithm Decoupling and Module Replacement
**RLLTE** allows developers to replace settled modules of implemented algorithms to make performance comparison and algorithm improvement, and both 
built-in and custom modules are supported. Suppose we want to compare the effect of different encoders, it suffices to invoke the `.set` function:
``` py
from rllte.xploit.encoder import EspeholtResidualEncoder
encoder = EspeholtResidualEncoder(...)
agent.set(encoder=encoder)
```
**RLLTE** is an extremely open framework that allows developers to try anything. For more detailed tutorials, see [Tutorials](https://docs.rllte.dev/tutorials).

# Function List (Part)
## RL Agents
|     Type    |  Algo. | Box | Dis. | M.B. | M.D. | M.P. | NPU |πŸ’°|πŸ”­|
|:-----------:|:------:|:---:|:----:|:----:|:----:|:------:|:---:|:------:|:---:|
| On-Policy   | [A2C](https://arxiv.org/abs/1602.01783)    | βœ”οΈ   | βœ”οΈ    | βœ”οΈ    | βœ”οΈ    | βœ”οΈ    | βœ”οΈ   |βœ”οΈ    |❌    |
| On-Policy   | [PPO](https://arxiv.org/pdf/1707.06347)    | βœ”οΈ   | βœ”οΈ    | βœ”οΈ    | βœ”οΈ    | βœ”οΈ    | βœ”οΈ   |βœ”οΈ    |❌    |
| On-Policy   | [DrAC](https://proceedings.neurips.cc/paper/2021/file/2b38c2df6a49b97f706ec9148ce48d86-Paper.pdf)| βœ”οΈ   | βœ”οΈ    | βœ”οΈ    | βœ”οΈ    | βœ”οΈ    | βœ”οΈ   |βœ”οΈ    | βœ”οΈ   |
| On-Policy   | [DAAC](http://proceedings.mlr.press/v139/raileanu21a/raileanu21a.pdf)| βœ”οΈ   | βœ”οΈ    | βœ”οΈ    | βœ”οΈ    | βœ”οΈ    | βœ”οΈ   |βœ”οΈ    | ❌   |
| On-Policy   | [DrDAAC](https://proceedings.neurips.cc/paper/2021/file/2b38c2df6a49b97f706ec9148ce48d86-Paper.pdf)| βœ”οΈ   | βœ”οΈ    | βœ”οΈ    | βœ”οΈ    | βœ”οΈ    | βœ”οΈ   |βœ”οΈ    | βœ”οΈ   |
| On-Policy   | [PPG](http://proceedings.mlr.press/v139/cobbe21a/cobbe21a.pdf)| βœ”οΈ   | βœ”οΈ    | βœ”οΈ    |  ❌   | βœ”οΈ    | βœ”οΈ   |βœ”οΈ    | ❌   |
| Off-Policy  | [DQN](https://training.incf.org/sites/default/files/2023-05/Human-level%20control%20through%20deep%20reinforcement%20learning.pdf) | βœ”οΈ   | ❌    | ❌    | ❌    | βœ”οΈ    | βœ”οΈ   |βœ”οΈ    | ❌   |
| Off-Policy  | [DDPG](https://arxiv.org/pdf/1509.02971.pdf?source=post_page---------------------------)| βœ”οΈ   | ❌    | ❌    | ❌    | βœ”οΈ    | βœ”οΈ   |βœ”οΈ    |❌    |
| Off-Policy  | [SAC](http://proceedings.mlr.press/v80/haarnoja18b/haarnoja18b.pdf)| βœ”οΈ   | ❌    | ❌    | ❌    | βœ”οΈ    | βœ”οΈ   |βœ”οΈ    |❌    |
| Off-Policy  | [SAC-Discrete](https://arxiv.org/abs/1910.07207)|  ❌  | βœ”οΈ    | ❌    | ❌    | βœ”οΈ    | βœ”οΈ   |βœ”οΈ    |❌    |
| Off-Policy  | [TD3](http://proceedings.mlr.press/v80/fujimoto18a/fujimoto18a.pdf)| βœ”οΈ   | ❌    | ❌    | ❌    | βœ”οΈ    | βœ”οΈ   |βœ”οΈ    |❌    |
| Off-Policy  | [DrQ-v2](https://arxiv.org/pdf/2107.09645.pdf?utm_source=morioh.com)| βœ”οΈ   | ❌    | ❌    | ❌    | ❌    | βœ”οΈ   |βœ”οΈ    |βœ”οΈ    |
| Distributed | [IMPALA](http://proceedings.mlr.press/v80/espeholt18a/espeholt18a.pdf) | βœ”οΈ   | βœ”οΈ    | ❌    | ❌    | βœ”οΈ    | ❌   |❌    |❌    |

> - `Dis., M.B., M.D.`: `Discrete`, `MultiBinary`, and `MultiDiscrete` action space;
> - `M.P.`: Multi processing;
> - 🐌: Developing;
> - πŸ’°: Support intrinsic reward shaping;
> - πŸ”­: Support observation augmentation. 


## Intrinsic Reward Modules
| **Type** 	| **Modules** 	|
|---	|---	|
| Count-based 	| [PseudoCounts](https://arxiv.org/pdf/2002.06038), [RND](https://arxiv.org/pdf/1810.12894.pdf) 	|
| Curiosity-driven 	| [ICM](http://proceedings.mlr.press/v70/pathak17a/pathak17a.pdf), [GIRM](http://proceedings.mlr.press/v119/yu20d/yu20d.pdf), [RIDE](https://arxiv.org/pdf/2002.12292) 	|
| Memory-based 	| [NGU](https://arxiv.org/pdf/2002.06038) 	|
| Information theory-based 	| [RE3](http://proceedings.mlr.press/v139/seo21a/seo21a.pdf), [RISE](https://ieeexplore.ieee.org/abstract/document/9802917/), [REVD](https://openreview.net/pdf?id=V2pw1VYMrDo) 	|

See [Tutorials: Use Intrinsic Reward and Observation Augmentation](https://docs.rllte.dev/tutorials/data_augmentation) for usage examples.

# RLLTE Ecosystem
Explore the ecosystem of RLLTE to facilitate your project:

- [Hub](https://docs.rllte.dev/benchmarks/): Fast training APIs and reusable benchmarks.
- [Evaluation](https://docs.rllte.dev/api/tutorials/): Reasonable and reliable metrics for algorithm evaluation.
- [Env](https://docs.rllte.dev/api/tutorials/): Packaged environments for fast invocation.
- [Deployment](https://docs.rllte.dev/api/tutorials/): Convenient APIs for model deployment.
- [Pre-training](https://docs.rllte.dev/api/tutorials/): Methods of pre-training in RL.
- [Copilot](https://docs.rllte.dev/copilot): Large language model-empowered copilot.

# API Documentation
View our well-designed documentation: [https://docs.rllte.dev/](https://docs.rllte.dev/)
<div align=center>
<img src='./docs/assets/images/docs.gif' style="width: 100%">
</div>

# How To Contribute
Welcome to contribute to this project! Before you begin writing code, please read [CONTRIBUTING.md](https://github.com/RLE-Foundation/rllte/blob/main/CONTRIBUTING.md) for guide first.

# Cite the Project
If you use **RLLTE** in your research, please cite this project like this:
```bibtex
@article{yuan2023rllte,
  title={RLLTE: Long-Term Evolution Project of Reinforcement Learning}, 
  author={Mingqi Yuan and Zequn Zhang and Yang Xu and Shihao Luo and Bo Li and Xin Jin and Wenjun Zeng},
  year={2023},
  journal={arXiv preprint arXiv:2309.16382}
}
```

# Acknowledgment
This project is supported by [The Hong Kong Polytechnic University](http://www.polyu.edu.hk/), [Eastern Institute for Advanced Study](http://www.eias.ac.cn/), and [FLW-Foundation](FLW-Foundation). [EIAS HPC](https://hpc.eias.ac.cn/) provides a GPU computing platform, and [HUAWEI Ascend Community](https://www.hiascend.com/) provides an NPU computing platform for our testing. Some code of this project is borrowed or inspired by several excellent projects, and we highly appreciate them. See [ACKNOWLEDGMENT.md](https://github.com/RLE-Foundation/rllte/blob/main/ACKNOWLEDGMENT.md).

# Miscellaneous

## &#8627; Stargazers, thanks for your support!
[![Stargazers repo roster for @RLE-Foundation/rllte](https://reporoster.com/stars/RLE-Foundation/rllte)](https://github.com/RLE-Foundation/rllte/stargazers)

## &#8627; Forkers, thanks for your support!
[![Forkers repo roster for @RLE-Foundation/rllte](https://reporoster.com/forks/RLE-Foundation/rllte)](https://github.com/RLE-Foundation/rllte/network/members)

## &#8627; Star History
<div align="center">

[![Star History Chart](https://api.star-history.com/svg?repos=RLE-Foundation/rllte&type=Date)](https://star-history.com/#RLE-Foundation/rllte&Date)

</div>

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "rllte-core",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "Algorithm,Baseline,Evolution,Reinforcement Learning",
    "author": "",
    "author_email": "Reinforcement Learning Evolution Foundation <friedrichyuan19990827@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/4f/eb/715ec938e056094a28cd11f1daf8b14fd5636e40b344849b37327c03c1a7/rllte_core-0.0.1b13.tar.gz",
    "platform": null,
    "description": "<div align=center>\n<br>\n<img src='./docs/assets/images/logo_horizontal.svg' style=\"width: 75%\">\n<br>\nRLLTE: Long-Term Evolution Project of Reinforcement Learning\n\n<h3> <a href=\"https://arxiv.org/pdf/2309.16382.pdf\"> Paper </a> |\n<a href=\"https://docs.rllte.dev/api/\"> Documentation </a> |\n<a href=\"https://docs.rllte.dev/tutorials/\"> Tutorials </a> |\n<a href=\"https://github.com/RLE-Foundation/rllte/discussions\"> Forum </a> |\n<a href=\"https://hub.rllte.dev/\"> Benchmarks </a></h3>\n\n<img src=\"https://img.shields.io/badge/License-MIT-%230677b8\"> <img src=\"https://img.shields.io/badge/GPU-NVIDIA-%2377b900\"> <img src=\"https://img.shields.io/badge/NPU-Ascend-%23c31d20\"> <img src=\"https://img.shields.io/badge/Python-%3E%3D3.8-%2335709F\"> <img src=\"https://img.shields.io/badge/Docs-Passing-%23009485\"> <img src=\"https://img.shields.io/badge/Codestyle-Black-black\"> <img src=\"https://img.shields.io/badge/PyPI-0.0.1-%23006DAD\"> <img src=\"https://img.shields.io/badge/Coverage-97.00%25-green\"> \n\n| [English](README.md) | [\u4e2d\u6587](docs/README-zh-Hans.md) |\n\n</div>\n\n# Contents\n- [Overview](#overview)\n- [Quick Start](#quick-start)\n  + [Installation](#installation)\n  + [Fast Training with Built-in Algorithms](#fast-training-with-built-in-algorithms)\n    - [On NVIDIA GPU](#on-nvidia-gpu)\n    - [On HUAWEI NPU](#on-huawei-npu)\n  + [Three Steps to Create Your RL Agent](#three-steps-to-create-your-rl-agent)\n  + [Algorithm Decoupling and Module Replacement](#algorithm-decoupling-and-module-replacement)\n- [Function List (Part)](#function-list-part)\n  + [RL Agents](#rl-agents)\n  + [Intrinsic Reward Modules](#intrinsic-reward-modules)\n- [RLLTE Ecosystem](#rllte-ecosystem)\n- [API Documentation](#api-documentation)\n- [Cite the Project](#cite-the-project)\n- [How To Contribute](#how-to-contribute)\n- [Acknowledgment](#acknowledgment)\n- [Miscellaneous](#miscellaneous)\n\n# Overview\nInspired by the long-term evolution (LTE) standard project in telecommunications, aiming to provide development components for and standards for advancing RL research and applications. Beyond delivering top-notch algorithm implementations, **RLLTE** also serves as a **toolkit** for developing algorithms.\n\n<div align=\"center\">\n<a href=\"https://youtu.be/PMF6fa72bmE\" rel=\"nofollow\">\n<img src='./docs/assets/images/youtube.png' style=\"width: 70%\">\n</a>\n<br>\nAn introduction to RLLTE.\n</div>\n\nWhy **RLLTE**?\n- \ud83e\uddec Long-term evolution for providing latest algorithms and tricks;\n- \ud83c\udfde\ufe0f Complete ecosystem for task design, model training, evaluation, and deployment (TensorRT, CANN, ...);\n- \ud83e\uddf1 Module-oriented design for complete decoupling of RL algorithms;\n- \ud83d\ude80 Optimized workflow for full hardware acceleration;\n- \u2699\ufe0f Support custom environments and modules;\n- \ud83d\udda5\ufe0f Support multiple computing devices like GPU and NPU;\n- \ud83d\udcbe Large number of reusable benchmarks ([RLLTE Hub](https://hub.rllte.dev));\n- \ud83d\udc68\u200d\u2708\ufe0f Large language model-empowered copilot ([RLLTE Copilot](https://github.com/RLE-Foundation/rllte-copilot)).\n\n> \u26a0\ufe0f Since the construction of RLLTE Hub requires massive computing power, we have to upload the training datasets and model weights gradually. Progress report can be found in [Issue#30](https://github.com/RLE-Foundation/rllte/issues/30).\n\nSee the project structure below:\n<div align=center>\n<img src='./docs/assets/images/structure.svg' style=\"width: 100%\">\n</div>\n\nFor more detailed descriptions of these modules, see [API Documentation](https://docs.rllte.dev/api).\n\n# Quick Start\n## Installation\n- Prerequisites\n\nCurrently, we recommend `Python>=3.8`, and user can create an virtual environment by\n``` sh\nconda create -n rllte python=3.8\n```\n\n- with pip `recommended`\n\nOpen a terminal and install **rllte** with `pip`:\n``` shell\npip install rllte-core # basic installation\npip install rllte-core[envs] # for pre-defined environments\n```\n\n- with git\n\nOpen a terminal and clone the repository from [GitHub](https://github.com/RLE-Foundation/rllte) with `git`:\n``` sh\ngit clone https://github.com/RLE-Foundation/rllte.git\n```\nAfter that, run the following command to install package and dependencies:\n``` sh\npip install -e . # basic installation\npip install -e .[envs] # for pre-defined environments\n```\n\nFor more detailed installation instruction, see [Getting Started](https://docs.rllte.dev/getting_started).\n\n## Fast Training with Built-in Algorithms\n**RLLTE** provides implementations for well-recognized RL algorithms and simple interface for building applications.\n### On NVIDIA GPU\nSuppose we want to use [DrQ-v2](https://openreview.net/forum?id=_SJ-_yyes8) to solve a task of [DeepMind Control Suite](https://github.com/deepmind/dm_control), and it suffices to write a `train.py` like:\n\n``` python\n# import `env` and `agent` module\nfrom rllte.env import make_dmc_env \nfrom rllte.agent import DrQv2\n\nif __name__ == \"__main__\":\n    device = \"cuda:0\"\n    # create env, `eval_env` is optional\n    env = make_dmc_env(env_id=\"cartpole_balance\", device=device)\n    eval_env = make_dmc_env(env_id=\"cartpole_balance\", device=device)\n    # create agent\n    agent = DrQv2(env=env, eval_env=eval_env, device=device, tag=\"drqv2_dmc_pixel\")\n    # start training\n    agent.train(num_train_steps=500000, log_interval=1000)\n```\nRun `train.py` and you will see the following output:\n\n<div align=center>\n<img src='./docs/assets/images/rl_training_gpu.gif' style=\"filter: drop-shadow(0px 0px 7px #000);\">\n</div>\n\n### On HUAWEI NPU\nSimilarly, if we want to train an agent on HUAWEI NPU, it suffices to replace `cuda` with `npu`:\n``` python\ndevice = \"cuda:0\" -> device = \"npu:0\"\n```\n\n## Three Steps to Create Your RL Agent\nDevelopers only need three steps to implement an RL algorithm with **RLLTE**. The following example illustrates how to write an Advantage Actor-Critic (A2C) agent to solve Atari games. \n- Firstly, select a prototype:\n``` py\nfrom rllte.common.prototype import OnPolicyAgent\n```\n- Secondly, select necessary modules to build the agent:\n``` py\nfrom rllte.xploit.encoder import MnihCnnEncoder\nfrom rllte.xploit.policy import OnPolicySharedActorCritic\nfrom rllte.xploit.storage import VanillaRolloutStorage\nfrom rllte.xplore.distribution import Categorical\n```\n- Run the `.describe` function of the selected policy and you will see the following output:\n``` py\nOnPolicySharedActorCritic.describe()\n# Output:\n# ================================================================================\n# Name       : OnPolicySharedActorCritic\n# Structure  : self.encoder (shared by actor and critic), self.actor, self.critic\n# Forward    : obs -> self.encoder -> self.actor -> actions\n#            : obs -> self.encoder -> self.critic -> values\n#            : actions -> log_probs\n# Optimizers : self.optimizers['opt'] -> (self.encoder, self.actor, self.critic)\n# ================================================================================\n```\nThis will illustrate the structure of the policy and indicate the optimizable parts. Finally, merge these modules and write an `.update` function:\n``` py\nfrom torch import nn\nimport torch as th\n\nclass A2C(OnPolicyAgent):\n    def __init__(self, env, tag, seed, device, num_steps) -> None:\n        super().__init__(env=env, tag=tag, seed=seed, device=device, num_steps=num_steps)\n        # create modules\n        encoder = MnihCnnEncoder(observation_space=env.observation_space, feature_dim=512)\n        policy = OnPolicySharedActorCritic(observation_space=env.observation_space,\n                                           action_space=env.action_space,\n                                           feature_dim=512,\n                                           opt_class=th.optim.Adam,\n                                           opt_kwargs=dict(lr=2.5e-4, eps=1e-5),\n                                           init_fn=\"xavier_uniform\"\n                                           )\n        storage = VanillaRolloutStorage(observation_space=env.observation_space,\n                                        action_space=env.action_space,\n                                        device=device,\n                                        storage_size=self.num_steps,\n                                        num_envs=self.num_envs,\n                                        batch_size=256\n                                        )\n        dist = Categorical()\n        # set all the modules\n        self.set(encoder=encoder, policy=policy, storage=storage, distribution=dist)\n    \n    def update(self):\n        for _ in range(4):\n            for batch in self.storage.sample():\n                # evaluate the sampled actions\n                new_values, new_log_probs, entropy = self.policy.evaluate_actions(obs=batch.observations, actions=batch.actions)\n                # policy loss part\n                policy_loss = - (batch.adv_targ * new_log_probs).mean()\n                # value loss part\n                value_loss = 0.5 * (new_values.flatten() - batch.returns).pow(2).mean()\n                # update\n                self.policy.optimizers['opt'].zero_grad(set_to_none=True)\n                (value_loss * 0.5 + policy_loss - entropy * 0.01).backward()\n                nn.utils.clip_grad_norm_(self.policy.parameters(), 0.5)\n                self.policy.optimizers['opt'].step()\n```\nThen train the agent by\n``` py\nfrom rllte.env import make_atari_env\nif __name__ == \"__main__\":\n    device = \"cuda\"\n    env = make_atari_env(\"PongNoFrameskip-v4\", num_envs=8, seed=0, device=device)\n    agent = A2C(env=env, tag=\"a2c_atari\", seed=0, device=device, num_steps=128)\n    agent.train(num_train_steps=10000000)\n```\nAs shown in this example, only a few dozen lines of code are needed to create RL agents with **RLLTE**. \n\n## Algorithm Decoupling and Module Replacement\n**RLLTE** allows developers to replace settled modules of implemented algorithms to make performance comparison and algorithm improvement, and both \nbuilt-in and custom modules are supported. Suppose we want to compare the effect of different encoders, it suffices to invoke the `.set` function:\n``` py\nfrom rllte.xploit.encoder import EspeholtResidualEncoder\nencoder = EspeholtResidualEncoder(...)\nagent.set(encoder=encoder)\n```\n**RLLTE** is an extremely open framework that allows developers to try anything. For more detailed tutorials, see [Tutorials](https://docs.rllte.dev/tutorials).\n\n# Function List (Part)\n## RL Agents\n|     Type    |  Algo. | Box | Dis. | M.B. | M.D. | M.P. | NPU |\ud83d\udcb0|\ud83d\udd2d|\n|:-----------:|:------:|:---:|:----:|:----:|:----:|:------:|:---:|:------:|:---:|\n| On-Policy   | [A2C](https://arxiv.org/abs/1602.01783)    | \u2714\ufe0f   | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f   |\u2714\ufe0f    |\u274c    |\n| On-Policy   | [PPO](https://arxiv.org/pdf/1707.06347)    | \u2714\ufe0f   | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f   |\u2714\ufe0f    |\u274c    |\n| On-Policy   | [DrAC](https://proceedings.neurips.cc/paper/2021/file/2b38c2df6a49b97f706ec9148ce48d86-Paper.pdf)| \u2714\ufe0f   | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f   |\u2714\ufe0f    | \u2714\ufe0f   |\n| On-Policy   | [DAAC](http://proceedings.mlr.press/v139/raileanu21a/raileanu21a.pdf)| \u2714\ufe0f   | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f   |\u2714\ufe0f    | \u274c   |\n| On-Policy   | [DrDAAC](https://proceedings.neurips.cc/paper/2021/file/2b38c2df6a49b97f706ec9148ce48d86-Paper.pdf)| \u2714\ufe0f   | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f    | \u2714\ufe0f   |\u2714\ufe0f    | \u2714\ufe0f   |\n| On-Policy   | [PPG](http://proceedings.mlr.press/v139/cobbe21a/cobbe21a.pdf)| \u2714\ufe0f   | \u2714\ufe0f    | \u2714\ufe0f    |  \u274c   | \u2714\ufe0f    | \u2714\ufe0f   |\u2714\ufe0f    | \u274c   |\n| Off-Policy  | [DQN](https://training.incf.org/sites/default/files/2023-05/Human-level%20control%20through%20deep%20reinforcement%20learning.pdf) | \u2714\ufe0f   | \u274c    | \u274c    | \u274c    | \u2714\ufe0f    | \u2714\ufe0f   |\u2714\ufe0f    | \u274c   |\n| Off-Policy  | [DDPG](https://arxiv.org/pdf/1509.02971.pdf?source=post_page---------------------------)| \u2714\ufe0f   | \u274c    | \u274c    | \u274c    | \u2714\ufe0f    | \u2714\ufe0f   |\u2714\ufe0f    |\u274c    |\n| Off-Policy  | [SAC](http://proceedings.mlr.press/v80/haarnoja18b/haarnoja18b.pdf)| \u2714\ufe0f   | \u274c    | \u274c    | \u274c    | \u2714\ufe0f    | \u2714\ufe0f   |\u2714\ufe0f    |\u274c    |\n| Off-Policy  | [SAC-Discrete](https://arxiv.org/abs/1910.07207)|  \u274c  | \u2714\ufe0f    | \u274c    | \u274c    | \u2714\ufe0f    | \u2714\ufe0f   |\u2714\ufe0f    |\u274c    |\n| Off-Policy  | [TD3](http://proceedings.mlr.press/v80/fujimoto18a/fujimoto18a.pdf)| \u2714\ufe0f   | \u274c    | \u274c    | \u274c    | \u2714\ufe0f    | \u2714\ufe0f   |\u2714\ufe0f    |\u274c    |\n| Off-Policy  | [DrQ-v2](https://arxiv.org/pdf/2107.09645.pdf?utm_source=morioh.com)| \u2714\ufe0f   | \u274c    | \u274c    | \u274c    | \u274c    | \u2714\ufe0f   |\u2714\ufe0f    |\u2714\ufe0f    |\n| Distributed | [IMPALA](http://proceedings.mlr.press/v80/espeholt18a/espeholt18a.pdf) | \u2714\ufe0f   | \u2714\ufe0f    | \u274c    | \u274c    | \u2714\ufe0f    | \u274c   |\u274c    |\u274c    |\n\n> - `Dis., M.B., M.D.`: `Discrete`, `MultiBinary`, and `MultiDiscrete` action space;\n> - `M.P.`: Multi processing;\n> - \ud83d\udc0c: Developing;\n> - \ud83d\udcb0: Support intrinsic reward shaping;\n> - \ud83d\udd2d: Support observation augmentation. \n\n\n## Intrinsic Reward Modules\n| **Type** \t| **Modules** \t|\n|---\t|---\t|\n| Count-based \t| [PseudoCounts](https://arxiv.org/pdf/2002.06038), [RND](https://arxiv.org/pdf/1810.12894.pdf) \t|\n| Curiosity-driven \t| [ICM](http://proceedings.mlr.press/v70/pathak17a/pathak17a.pdf), [GIRM](http://proceedings.mlr.press/v119/yu20d/yu20d.pdf), [RIDE](https://arxiv.org/pdf/2002.12292) \t|\n| Memory-based \t| [NGU](https://arxiv.org/pdf/2002.06038) \t|\n| Information theory-based \t| [RE3](http://proceedings.mlr.press/v139/seo21a/seo21a.pdf), [RISE](https://ieeexplore.ieee.org/abstract/document/9802917/), [REVD](https://openreview.net/pdf?id=V2pw1VYMrDo) \t|\n\nSee [Tutorials: Use Intrinsic Reward and Observation Augmentation](https://docs.rllte.dev/tutorials/data_augmentation) for usage examples.\n\n# RLLTE Ecosystem\nExplore the ecosystem of RLLTE to facilitate your project:\n\n- [Hub](https://docs.rllte.dev/benchmarks/): Fast training APIs and reusable benchmarks.\n- [Evaluation](https://docs.rllte.dev/api/tutorials/): Reasonable and reliable metrics for algorithm evaluation.\n- [Env](https://docs.rllte.dev/api/tutorials/): Packaged environments for fast invocation.\n- [Deployment](https://docs.rllte.dev/api/tutorials/): Convenient APIs for model deployment.\n- [Pre-training](https://docs.rllte.dev/api/tutorials/): Methods of pre-training in RL.\n- [Copilot](https://docs.rllte.dev/copilot): Large language model-empowered copilot.\n\n# API Documentation\nView our well-designed documentation: [https://docs.rllte.dev/](https://docs.rllte.dev/)\n<div align=center>\n<img src='./docs/assets/images/docs.gif' style=\"width: 100%\">\n</div>\n\n# How To Contribute\nWelcome to contribute to this project! Before you begin writing code, please read [CONTRIBUTING.md](https://github.com/RLE-Foundation/rllte/blob/main/CONTRIBUTING.md) for guide first.\n\n# Cite the Project\nIf you use **RLLTE** in your research, please cite this project like this:\n```bibtex\n@article{yuan2023rllte,\n  title={RLLTE: Long-Term Evolution Project of Reinforcement Learning}, \n  author={Mingqi Yuan and Zequn Zhang and Yang Xu and Shihao Luo and Bo Li and Xin Jin and Wenjun Zeng},\n  year={2023},\n  journal={arXiv preprint arXiv:2309.16382}\n}\n```\n\n# Acknowledgment\nThis project is supported by [The Hong Kong Polytechnic University](http://www.polyu.edu.hk/), [Eastern Institute for Advanced Study](http://www.eias.ac.cn/), and [FLW-Foundation](FLW-Foundation). [EIAS HPC](https://hpc.eias.ac.cn/) provides a GPU computing platform, and [HUAWEI Ascend Community](https://www.hiascend.com/) provides an NPU computing platform for our testing. Some code of this project is borrowed or inspired by several excellent projects, and we highly appreciate them. See [ACKNOWLEDGMENT.md](https://github.com/RLE-Foundation/rllte/blob/main/ACKNOWLEDGMENT.md).\n\n# Miscellaneous\n\n## &#8627; Stargazers, thanks for your support!\n[![Stargazers repo roster for @RLE-Foundation/rllte](https://reporoster.com/stars/RLE-Foundation/rllte)](https://github.com/RLE-Foundation/rllte/stargazers)\n\n## &#8627; Forkers, thanks for your support!\n[![Forkers repo roster for @RLE-Foundation/rllte](https://reporoster.com/forks/RLE-Foundation/rllte)](https://github.com/RLE-Foundation/rllte/network/members)\n\n## &#8627; Star History\n<div align=\"center\">\n\n[![Star History Chart](https://api.star-history.com/svg?repos=RLE-Foundation/rllte&type=Date)](https://star-history.com/#RLE-Foundation/rllte&Date)\n\n</div>\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Long-Term Evolution Project of Reinforcement Learning",
    "version": "0.0.1b13",
    "project_urls": {
        "Benchmark": "https://hub.rllte.dev/",
        "Bug Tracker": "https://github.com/RLE-Foundation/rllte/issues",
        "Code": "https://github.com/RLE-Foundation/rllte",
        "Documentation": "https://docs.rllte.dev/"
    },
    "split_keywords": [
        "algorithm",
        "baseline",
        "evolution",
        "reinforcement learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4b4ce93a2517c48327347caee62ceff3d07c61698bd5bc4a9a6cb7d45d42bf55",
                "md5": "c4431d860286fc5248465dcdbdf79fef",
                "sha256": "e1f5dd0bd0b3c456dea8511c74a9e4492ca2264765e8e3734670dcdf3f2b091a"
            },
            "downloads": -1,
            "filename": "rllte_core-0.0.1b13-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c4431d860286fc5248465dcdbdf79fef",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 722638,
            "upload_time": "2024-02-29T22:52:37",
            "upload_time_iso_8601": "2024-02-29T22:52:37.223175Z",
            "url": "https://files.pythonhosted.org/packages/4b/4c/e93a2517c48327347caee62ceff3d07c61698bd5bc4a9a6cb7d45d42bf55/rllte_core-0.0.1b13-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4feb715ec938e056094a28cd11f1daf8b14fd5636e40b344849b37327c03c1a7",
                "md5": "3a142440f9795a7a1a97e4ff714a1de9",
                "sha256": "b70661991212ecfc3f2ebd110139e662098353a302c1a6c6c249742227711524"
            },
            "downloads": -1,
            "filename": "rllte_core-0.0.1b13.tar.gz",
            "has_sig": false,
            "md5_digest": "3a142440f9795a7a1a97e4ff714a1de9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 13860301,
            "upload_time": "2024-02-29T22:52:42",
            "upload_time_iso_8601": "2024-02-29T22:52:42.382955Z",
            "url": "https://files.pythonhosted.org/packages/4f/eb/715ec938e056094a28cd11f1daf8b14fd5636e40b344849b37327c03c1a7/rllte_core-0.0.1b13.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-29 22:52:42",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "RLE-Foundation",
    "github_project": "rllte",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "rllte-core"
}
        
Elapsed time: 0.21288s