openrl

Name	openrl JSON
Version	0.1.9 JSON
	download
home_page	https://github.com/OpenRL-Lab/openrl
Summary	unified reinforcement learning framework
upload_time	2023-10-27 03:41:46
maintainer
docs_url	None
author	openrl contributors
requires_python	>=3.8
license
keywords	reinforcement-learning multi-agent reinforcement-learning-algorithms pytorch machine-learning baselines toolbox python data-science gym gymnasium
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <div align="center">
    <a href="https://openrl-docs.readthedocs.io/zh/latest/index.html"><img width="450px" height="auto" src="docs/images/openrl_text.png"></a>
</div>

---
[![PyPI](https://img.shields.io/pypi/v/openrl)](https://pypi.org/project/openrl/)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/openrl)
[![Anaconda-Server Badge](https://anaconda.org/openrl/openrl/badges/version.svg)](https://anaconda.org/openrl/openrl)
[![Anaconda-Server Badge](https://anaconda.org/openrl/openrl/badges/latest_release_date.svg)](https://anaconda.org/openrl/openrl)
[![Anaconda-Server Badge](https://anaconda.org/openrl/openrl/badges/downloads.svg)](https://anaconda.org/openrl/openrl)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

[![Hits-of-Code](https://hitsofcode.com/github/OpenRL-Lab/openrl?branch=main)](https://hitsofcode.com/github/OpenRL-Lab/openrl/view?branch=main)
[![codecov](https://codecov.io/gh/OpenRL-Lab/openrl/graph/badge.svg?token=T6BqaTiT0l)](https://codecov.io/gh/OpenRL-Lab/openrl)

[![Documentation Status](https://readthedocs.org/projects/openrl-docs/badge/?version=latest)](https://openrl-docs.readthedocs.io/en/latest/?badge=latest)
[![Read the Docs](https://img.shields.io/readthedocs/openrl-docs-zh?label=%E4%B8%AD%E6%96%87%E6%96%87%E6%A1%A3)](https://openrl-docs.readthedocs.io/zh/latest/)

![GitHub Org's stars](https://img.shields.io/github/stars/OpenRL-Lab)
[![GitHub stars](https://img.shields.io/github/stars/OpenRL-Lab/openrl)](https://github.com/opendilab/OpenRL/stargazers)
[![GitHub forks](https://img.shields.io/github/forks/OpenRL-Lab/openrl)](https://github.com/OpenRL-Lab/openrl/network)
![GitHub commit activity](https://img.shields.io/github/commit-activity/m/OpenRL-Lab/openrl)
[![GitHub issues](https://img.shields.io/github/issues/OpenRL-Lab/openrl)](https://github.com/OpenRL-Lab/openrl/issues)
[![GitHub pulls](https://img.shields.io/github/issues-pr/OpenRL-Lab/openrl)](https://github.com/OpenRL-Lab/openrl/pulls)
[![Contributors](https://img.shields.io/github/contributors/OpenRL-Lab/openrl)](https://github.com/OpenRL-Lab/openrl/graphs/contributors)
[![GitHub license](https://img.shields.io/github/license/OpenRL-Lab/openrl)](https://github.com/OpenRL-Lab/openrl/blob/master/LICENSE)

[![Embark](https://img.shields.io/badge/discord-OpenRL-%237289da.svg?logo=discord)](https://discord.gg/guvAS2up)
[![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack&amp)](https://join.slack.com/t/openrlhq/shared_invite/zt-1tqwpvthd-Eeh0IxQ~DIaGqYXoW2IUQg)

OpenRL-v0.1.9 is updated on Oct 20, 2023

The main branch is the latest version of OpenRL, which is under active development. If you just want to have a try with
OpenRL, you can switch to the stable branch.

## Welcome to OpenRL

[Documentation](https://openrl-docs.readthedocs.io/en/latest/) | [中文介绍](README_zh.md) |  [中文文档](https://openrl-docs.readthedocs.io/zh/latest/)

<div align="center">
    Crafting Reinforcement Learning Frameworks with Passion, Your Valuable Insights Welcome.   <br><br>
</div>

OpenRL is an open-source general reinforcement learning research framework that supports training for various tasks
such as single-agent, multi-agent, offline RL, self-play, and natural language.
Developed based on PyTorch, the goal of OpenRL is to provide a simple-to-use, flexible, efficient and sustainable
platform for the reinforcement learning research community.

Currently, the features supported by OpenRL include:

- A **simple-to-use universal interface** that supports training for all tasks/environments

- Support for both single-agent and multi-agent tasks

- Support for offline RL training with expert dataset

- Support self-play training

- Reinforcement learning training support for natural language tasks (such as dialogue)

- Support [Arena](https://openrl-docs.readthedocs.io/en/latest/arena/index.html) , which allows convenient evaluation of
  various agents (even submissions for [JiDi](https://openrl-docs.readthedocs.io/en/latest/arena/index.html#performing-local-evaluation-of-agents-submitted-to-the-jidi-platform-using-openrl)) in a competitive environment.

- Importing models and datasets from [Hugging Face](https://huggingface.co/). Supports loading [Stable-baselines3 models from Hugging Face](https://openrl-docs.readthedocs.io/en/latest/sb3/index.html) for testing and training.

- [Tutorial](https://openrl-docs.readthedocs.io/en/latest/custom_env/index.html) on how to integrate user-defined environments into OpenRL.

- Support for models such as LSTM, GRU, Transformer etc.

- Multiple training acceleration methods including automatic mixed precision training and data collecting wth half
  precision policy network

- User-defined training models, reward models, training data and environment support

- Support for [gymnasium](https://gymnasium.farama.org/) environments

- Support for [Callbacks](https://openrl-docs.readthedocs.io/en/latest/callbacks/index.html), which can be used to
  implement various functions such as logging, saving, and early stopping

- Dictionary observation space support

- Popular visualization tools such
  as [wandb](https://wandb.ai/),  [tensorboardX](https://tensorboardx.readthedocs.io/en/latest/index.html) are supported

- Serial or parallel environment training while ensuring consistent results in both modes

- Chinese and English documentation

- Provides unit testing and code coverage testing

- Compliant with Black Code Style guidelines and type checking

Algorithms currently supported by OpenRL (for more details, please refer to [Gallery](./Gallery.md)):

- [Proximal Policy Optimization (PPO)](https://arxiv.org/abs/1707.06347)
- [Dual-clip PPO](https://arxiv.org/abs/1912.09729)
- [Multi-agent PPO (MAPPO)](https://arxiv.org/abs/2103.01955)
- [Joint-ratio Policy Optimization (JRPO)](https://arxiv.org/abs/2302.07515)
- [Generative Adversarial Imitation Learning (GAIL)](https://arxiv.org/abs/1606.03476)
- [Behavior Cloning (BC)](http://www.cse.unsw.edu.au/~claude/papers/MI15.pdf)
- [Advantage Actor-Critic (A2C)](https://arxiv.org/abs/1602.01783)
- Self-Play
- [Deep Q-Network (DQN)](https://arxiv.org/abs/1312.5602)
- [Multi-Agent Transformer (MAT)](https://arxiv.org/abs/2205.14953)
- [Value-Decomposition Network (VDN)](https://arxiv.org/abs/1706.05296)
- [Soft Actor Critic (SAC)](https://arxiv.org/abs/1812.05905)
- [Deep Deterministic Policy Gradient (DDPG)](https://arxiv.org/abs/1509.02971)

Environments currently supported by OpenRL (for more details, please refer to [Gallery](./Gallery.md)):

- [Gymnasium](https://gymnasium.farama.org/)
- [MuJoCo](https://github.com/deepmind/mujoco)
- [PettingZoo](https://pettingzoo.farama.org/)
- [MPE](https://github.com/openai/multiagent-particle-envs)
- [Chat Bot](https://openrl-docs.readthedocs.io/en/latest/quick_start/train_nlp.html)
- [Atari](https://gymnasium.farama.org/environments/atari/)
- [StarCraft II](https://github.com/oxwhirl/smac)
- [SMACv2](https://github.com/oxwhirl/smacv2)
- [Omniverse Isaac Gym](https://github.com/NVIDIA-Omniverse/OmniIsaacGymEnvs)
- [DeepMind Control](https://shimmy.farama.org/environments/dm_control/)
- [Snake](http://www.jidiai.cn/env_detail?envid=1)
- [gym-pybullet-drones](https://github.com/utiasDSL/gym-pybullet-drones)
- [GridWorld](./examples/gridworld/)
- [Super Mario Bros](https://github.com/Kautenja/gym-super-mario-bros)
- [Gym Retro](https://github.com/openai/retro)

This framework has undergone multiple iterations by the [OpenRL-Lab](https://github.com/OpenRL-Lab) team which has
applied it in academic research.
It has now become a mature reinforcement learning framework.

OpenRL-Lab will continue to maintain and update OpenRL, and we welcome everyone to join
our [open-source community](./CONTRIBUTING.md)
to contribute towards the development of reinforcement learning.

For more information about OpenRL, please refer to the [documentation](https://openrl-docs.readthedocs.io/en/latest/).

## Outline

- [Welcome to OpenRL](#welcome-to-openrl)
- [Outline](#outline)
- [Why OpenRL?](#why-openrl)
- [Installation](#installation)
- [Use Docker](#use-docker)
- [Quick Start](#quick-start)
- [Gallery](#gallery)
- [Projects Using OpenRL](#projects-using-openrl)
- [Feedback and Contribution](#feedback-and-contribution)
- [Maintainers](#maintainers)
- [Supporters](#supporters)
    - [&#8627; Contributors](#-contributors)
    - [&#8627; Stargazers](#-stargazers)
    - [&#8627; Forkers](#-forkers)
- [Citing OpenRL](#citing-openrl)
- [License](#license)
- [Acknowledgments](#acknowledgments)

## Why OpenRL

Here we provide a table for the comparison of OpenRL and existing popular RL libraries.
OpenRL employs a modular design and high-level abstraction, allowing users to accomplish training for various tasks
through a unified and user-friendly interface.

|                              Library                               |      NLP/RLHF      |     Multi-agent      |  Self-Play Training  |     Offline RL     | Bilingual Document | 
|:------------------------------------------------------------------:|:------------------:|:--------------------:|:--------------------:|:------------------:|:------------------:| 
|         **[OpenRL](https://github.com/OpenRL-Lab/openrl)**         | :heavy_check_mark: |  :heavy_check_mark:  |  :heavy_check_mark:  | :heavy_check_mark: | :heavy_check_mark: |
|  [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3)  |        :x:         |         :x:          |         :x:          |        :x:         |        :x:         |
| [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) |        :x:         |  :heavy_check_mark:  |  :heavy_check_mark:  | :heavy_check_mark: |        :x:         |
|        [DI-engine](https://github.com/opendilab/DI-engine/)        |        :x:         |  :heavy_check_mark:  | not fullly supported | :heavy_check_mark: | :heavy_check_mark: |
|           [Tianshou](https://github.com/thu-ml/tianshou)           |        :x:         | not fullly supported | not fullly supported | :heavy_check_mark: | :heavy_check_mark: |
|       [MARLlib](https://github.com/Replicable-MARL/MARLlib)        |        :x:         |  :heavy_check_mark:  | not fullly supported |        :x:         |        :x:         |
|   [MAPPO Benchmark](https://github.com/marlbenchmark/on-policy)    |        :x:         |  :heavy_check_mark:  |         :x:          |        :x:         |        :x:         |
|            [RL4LMs](https://github.com/allenai/RL4LMs)             | :heavy_check_mark: |         :x:          |         :x:          |        :x:         |        :x:         |
|              [trlx](https://github.com/CarperAI/trlx)              | :heavy_check_mark: |         :x:          |         :x:          |        :x:         |        :x:         |
|             [trl](https://github.com/huggingface/trl)              | :heavy_check_mark: |         :x:          |         :x:          |        :x:         |        :x:         |
|       [TimeChamber](https://github.com/inspirai/TimeChamber)       |        :x:         |         :x:          |  :heavy_check_mark:  |        :x:         |        :x:         |

## Installation

Users can directly install OpenRL via pip:

```bash
pip install openrl
```

If users are using Anaconda or Miniconda, they can also install OpenRL via conda:

```bash
conda install -c openrl openrl
```

Users who want to modify the source code can also install OpenRL from the source code:

```bash
git clone https://github.com/OpenRL-Lab/openrl.git && cd openrl
pip install -e .
```

After installation, users can check the version of OpenRL through command line:

```bash
openrl --version
```

**Tips**: No installation required, try OpenRL online through
Colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/15VBA-B7AJF8dBazzRcWAxJxZI7Pl9m-g?usp=sharing)

## Use Docker

OpenRL currently provides Docker images with and without GPU support.
If the user's computer does not have an NVIDIA GPU, they can obtain an image without the GPU plugin using the following
command:

```bash
sudo docker pull openrllab/openrl-cpu
```

If the user wants to accelerate training with a GPU, they can obtain it using the following command:

```bash
sudo docker pull openrllab/openrl
```

After successfully pulling the image, users can run OpenRL's Docker image using the following commands:

```bash
# Without GPU acceleration
sudo docker run -it openrllab/openrl-cpu
# With GPU acceleration 
sudo docker run -it --gpus all --net host openrllab/openrl
```

Once inside the Docker container, users can check OpenRL's version and then run test cases using these commands:

```bash 
# Check OpenRL version in Docker container  
openrl --version  
# Run test case  
openrl --mode train --env CartPole-v1  
```

## Quick Start

OpenRL provides a simple and easy-to-use interface for beginners in reinforcement learning.
Below is an example of using the PPO algorithm to train the `CartPole` environment:

```python
# train_ppo.py
from openrl.envs.common import make
from openrl.modules.common import PPONet as Net
from openrl.runners.common import PPOAgent as Agent

env = make("CartPole-v1", env_num=9)  # Create an environment and set the environment parallelism to 9.
net = Net(env)  # Create neural network.
agent = Agent(net)  # Initialize the agent.
agent.train(
    total_time_steps=20000)  # Start training and set the total number of steps to 20,000 for the running environment.
```

Training an agent using OpenRL only requires four simple steps:
**Create Environment** => **Initialize Model** => **Initialize Agent** => **Start Training**!

For a well-trained agent, users can also easily test the agent:

```python
# train_ppo.py
from openrl.envs.common import make
from openrl.modules.common import PPONet as Net
from openrl.runners.common import PPOAgent as Agent

agent = Agent(Net(make("CartPole-v1", env_num=9)))  # Initialize trainer.
agent.train(total_time_steps=20000)
# Create an environment for test, set the parallelism of the environment to 9, and set the rendering mode to group_human.
env = make("CartPole-v1", env_num=9, render_mode="group_human")
agent.set_env(env)  # The agent requires an interactive environment.
obs, info = env.reset()  # Initialize the environment to obtain initial observations and environmental information.
while True:
    action, _ = agent.act(obs)  # The agent predicts the next action based on environmental observations.
    # The environment takes one step according to the action, obtains the next observation, reward, whether it ends and environmental information.
    obs, r, done, info = env.step(action)
    if any(done): break
env.close()  # Close test environment
```

Executing the above code on a regular laptop only takes **a few seconds**
to complete the training. Below shows the visualization of the agent:

<div align="center">
  <img src="docs/images/train_ppo_cartpole.gif"></a>
</div>


**Tips:** Users can also quickly train the `CartPole` environment by executing a command line in the terminal.

```bash
openrl --mode train --env CartPole-v1
```

For training tasks such as multi-agent and natural language processing, OpenRL also provides a similarly simple and
easy-to-use interface.

For information on how to perform multi-agent training, set hyperparameters for training, load training configurations,
use wandb, save GIF animations, etc., please refer to:

- [Multi-Agent Training Example](https://openrl-docs.readthedocs.io/en/latest/quick_start/multi_agent_RL.html)

For information on natural language task training, loading models/datasets on Hugging Face, customizing training
models/reward models, etc., please refer to:

- [Dialogue Task Training Example](https://openrl-docs.readthedocs.io/en/latest/quick_start/train_nlp.html)

For more information about OpenRL, please refer to the [documentation](https://openrl-docs.readthedocs.io/en/latest/).

## Gallery

In order to facilitate users' familiarity with the framework, we provide more examples and demos of using OpenRL
in [Gallery](./Gallery.md).
Users are also welcome to contribute their own training examples and demos to the Gallery.

## Projects Using OpenRL

We have listed research projects that use OpenRL in the [OpenRL Project](./Project.md).
If you are using OpenRL in your research project, you are also welcome to join this list.

## Feedback and Contribution

- If you have any questions or find bugs, you can check or ask in
  the [Issues](https://github.com/OpenRL-Lab/openrl/issues).
- Join the QQ group: [OpenRL Official Communication Group](docs/images/qq.png)

<div align="center">
<a href="docs/images/qq.png"><img width="250px" height="auto" src="docs/images/qq.png"></a>
</div>

- Join the [slack](https://join.slack.com/t/openrlhq/shared_invite/zt-1tqwpvthd-Eeh0IxQ~DIaGqYXoW2IUQg) group to discuss
  OpenRL usage and development with us.
- Join the [Discord](https://discord.gg/guvAS2up) group to discuss OpenRL usage and development with us.
- Send an E-mail to: [huangshiyu@4paradigm.com](huangshiyu@4paradigm.com)
- Join the [GitHub Discussion](https://github.com/orgs/OpenRL-Lab/discussions).

The OpenRL framework is still under continuous development and documentation.
We welcome you to join us in making this project better:

- How to contribute code: Read the [Contributors' Guide](./CONTRIBUTING.md)
- [OpenRL Roadmap](https://github.com/OpenRL-Lab/openrl/issues/2)

## Maintainers

At present, OpenRL is maintained by the following maintainers:

- [Shiyu Huang](https://huangshiyu13.github.io/)([@huangshiyu13](https://github.com/huangshiyu13))
- Wenze Chen([@Chen001117](https://github.com/Chen001117))
- Yiwen Sun([@YiwenAI](https://github.com/YiwenAI))

Welcome more contributors to join our maintenance team (send an E-mail
to [huangshiyu@4paradigm.com](huangshiyu@4paradigm.com)
to apply for joining the OpenRL team).

## Supporters

### &#8627; Contributors

<a href="https://github.com/OpenRL-Lab/openrl/graphs/contributors">
  <img src="https://contrib.rocks/image?repo=OpenRL-Lab/openrl" />
</a>

### &#8627; Stargazers

[![Stargazers repo roster for @OpenRL-Lab/openrl](https://reporoster.com/stars/OpenRL-Lab/openrl)](https://github.com/OpenRL-Lab/openrl/stargazers)

### &#8627; Forkers

[![Forkers repo roster for @OpenRL-Lab/openrl](https://reporoster.com/forks/OpenRL-Lab/openrl)](https://github.com/OpenRL-Lab/openrl/network/members)

## Citing OpenRL

If our work has been helpful to you, please feel free to cite us:

```latex
@misc{openrl2023,
    title={OpenRL},
    author={OpenRL Contributors},
    publisher = {GitHub},
    howpublished = {\url{https://github.com/OpenRL-Lab/openrl}},
    year={2023},
}
```

## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=OpenRL-Lab/openrl&type=Date)](https://star-history.com/#OpenRL-Lab/openrl&Date)

## License

OpenRL under the Apache 2.0 license.

## Acknowledgments

The development of the OpenRL framework has drawn on the strengths of other reinforcement learning frameworks:

- Stable-baselines3: https://github.com/DLR-RM/stable-baselines3
- pytorch-a2c-ppo-acktr-gail: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail
- MAPPO: https://github.com/marlbenchmark/on-policy
- Gymnasium: https://github.com/Farama-Foundation/Gymnasium
- DI-engine: https://github.com/opendilab/DI-engine/
- Tianshou: https://github.com/thu-ml/tianshou
- RL4LMs: https://github.com/allenai/RL4LMs

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/OpenRL-Lab/openrl",
    "name": "openrl",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "reinforcement-learning multi-agent reinforcement-learning-algorithms pytorch machine-learning baselines toolbox python data-science gym gymnasium",
    "author": "openrl contributors",
    "author_email": "huangsy1314@163.com",
    "download_url": "https://files.pythonhosted.org/packages/58/bc/592a4423c3c4d1ca2e91eae751f54a5981c98c85da739a61f495cfac36b7/openrl-0.1.9.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n    <a href=\"https://openrl-docs.readthedocs.io/zh/latest/index.html\"><img width=\"450px\" height=\"auto\" src=\"docs/images/openrl_text.png\"></a>\n</div>\n\n---\n[![PyPI](https://img.shields.io/pypi/v/openrl)](https://pypi.org/project/openrl/)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/openrl)\n[![Anaconda-Server Badge](https://anaconda.org/openrl/openrl/badges/version.svg)](https://anaconda.org/openrl/openrl)\n[![Anaconda-Server Badge](https://anaconda.org/openrl/openrl/badges/latest_release_date.svg)](https://anaconda.org/openrl/openrl)\n[![Anaconda-Server Badge](https://anaconda.org/openrl/openrl/badges/downloads.svg)](https://anaconda.org/openrl/openrl)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n[![Hits-of-Code](https://hitsofcode.com/github/OpenRL-Lab/openrl?branch=main)](https://hitsofcode.com/github/OpenRL-Lab/openrl/view?branch=main)\n[![codecov](https://codecov.io/gh/OpenRL-Lab/openrl/graph/badge.svg?token=T6BqaTiT0l)](https://codecov.io/gh/OpenRL-Lab/openrl)\n\n[![Documentation Status](https://readthedocs.org/projects/openrl-docs/badge/?version=latest)](https://openrl-docs.readthedocs.io/en/latest/?badge=latest)\n[![Read the Docs](https://img.shields.io/readthedocs/openrl-docs-zh?label=%E4%B8%AD%E6%96%87%E6%96%87%E6%A1%A3)](https://openrl-docs.readthedocs.io/zh/latest/)\n\n![GitHub Org's stars](https://img.shields.io/github/stars/OpenRL-Lab)\n[![GitHub stars](https://img.shields.io/github/stars/OpenRL-Lab/openrl)](https://github.com/opendilab/OpenRL/stargazers)\n[![GitHub forks](https://img.shields.io/github/forks/OpenRL-Lab/openrl)](https://github.com/OpenRL-Lab/openrl/network)\n![GitHub commit activity](https://img.shields.io/github/commit-activity/m/OpenRL-Lab/openrl)\n[![GitHub issues](https://img.shields.io/github/issues/OpenRL-Lab/openrl)](https://github.com/OpenRL-Lab/openrl/issues)\n[![GitHub pulls](https://img.shields.io/github/issues-pr/OpenRL-Lab/openrl)](https://github.com/OpenRL-Lab/openrl/pulls)\n[![Contributors](https://img.shields.io/github/contributors/OpenRL-Lab/openrl)](https://github.com/OpenRL-Lab/openrl/graphs/contributors)\n[![GitHub license](https://img.shields.io/github/license/OpenRL-Lab/openrl)](https://github.com/OpenRL-Lab/openrl/blob/master/LICENSE)\n\n[![Embark](https://img.shields.io/badge/discord-OpenRL-%237289da.svg?logo=discord)](https://discord.gg/guvAS2up)\n[![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack&amp)](https://join.slack.com/t/openrlhq/shared_invite/zt-1tqwpvthd-Eeh0IxQ~DIaGqYXoW2IUQg)\n\nOpenRL-v0.1.9 is updated on Oct 20, 2023\n\nThe main branch is the latest version of OpenRL, which is under active development. If you just want to have a try with\nOpenRL, you can switch to the stable branch.\n\n## Welcome to OpenRL\n\n[Documentation](https://openrl-docs.readthedocs.io/en/latest/) | [\u4e2d\u6587\u4ecb\u7ecd](README_zh.md) |  [\u4e2d\u6587\u6587\u6863](https://openrl-docs.readthedocs.io/zh/latest/)\n\n<div align=\"center\">\n    Crafting Reinforcement Learning Frameworks with Passion, Your Valuable Insights Welcome.   <br><br>\n</div>\n\nOpenRL is an open-source general reinforcement learning research framework that supports training for various tasks\nsuch as single-agent, multi-agent, offline RL, self-play, and natural language.\nDeveloped based on PyTorch, the goal of OpenRL is to provide a simple-to-use, flexible, efficient and sustainable\nplatform for the reinforcement learning research community.\n\nCurrently, the features supported by OpenRL include:\n\n- A **simple-to-use universal interface** that supports training for all tasks/environments\n\n- Support for both single-agent and multi-agent tasks\n\n- Support for offline RL training with expert dataset\n\n- Support self-play training\n\n- Reinforcement learning training support for natural language tasks (such as dialogue)\n\n- Support [Arena](https://openrl-docs.readthedocs.io/en/latest/arena/index.html) , which allows convenient evaluation of\n  various agents (even submissions for [JiDi](https://openrl-docs.readthedocs.io/en/latest/arena/index.html#performing-local-evaluation-of-agents-submitted-to-the-jidi-platform-using-openrl)) in a competitive environment.\n\n- Importing models and datasets from [Hugging Face](https://huggingface.co/). Supports loading [Stable-baselines3 models from Hugging Face](https://openrl-docs.readthedocs.io/en/latest/sb3/index.html) for testing and training.\n\n- [Tutorial](https://openrl-docs.readthedocs.io/en/latest/custom_env/index.html) on how to integrate user-defined environments into OpenRL.\n\n- Support for models such as LSTM, GRU, Transformer etc.\n\n- Multiple training acceleration methods including automatic mixed precision training and data collecting wth half\n  precision policy network\n\n- User-defined training models, reward models, training data and environment support\n\n- Support for [gymnasium](https://gymnasium.farama.org/) environments\n\n- Support for [Callbacks](https://openrl-docs.readthedocs.io/en/latest/callbacks/index.html), which can be used to\n  implement various functions such as logging, saving, and early stopping\n\n- Dictionary observation space support\n\n- Popular visualization tools such\n  as [wandb](https://wandb.ai/),  [tensorboardX](https://tensorboardx.readthedocs.io/en/latest/index.html) are supported\n\n- Serial or parallel environment training while ensuring consistent results in both modes\n\n- Chinese and English documentation\n\n- Provides unit testing and code coverage testing\n\n- Compliant with Black Code Style guidelines and type checking\n\nAlgorithms currently supported by OpenRL (for more details, please refer to [Gallery](./Gallery.md)):\n\n- [Proximal Policy Optimization (PPO)](https://arxiv.org/abs/1707.06347)\n- [Dual-clip PPO](https://arxiv.org/abs/1912.09729)\n- [Multi-agent PPO (MAPPO)](https://arxiv.org/abs/2103.01955)\n- [Joint-ratio Policy Optimization (JRPO)](https://arxiv.org/abs/2302.07515)\n- [Generative Adversarial Imitation Learning (GAIL)](https://arxiv.org/abs/1606.03476)\n- [Behavior Cloning (BC)](http://www.cse.unsw.edu.au/~claude/papers/MI15.pdf)\n- [Advantage Actor-Critic (A2C)](https://arxiv.org/abs/1602.01783)\n- Self-Play\n- [Deep Q-Network (DQN)](https://arxiv.org/abs/1312.5602)\n- [Multi-Agent Transformer (MAT)](https://arxiv.org/abs/2205.14953)\n- [Value-Decomposition Network (VDN)](https://arxiv.org/abs/1706.05296)\n- [Soft Actor Critic (SAC)](https://arxiv.org/abs/1812.05905)\n- [Deep Deterministic Policy Gradient (DDPG)](https://arxiv.org/abs/1509.02971)\n\nEnvironments currently supported by OpenRL (for more details, please refer to [Gallery](./Gallery.md)):\n\n- [Gymnasium](https://gymnasium.farama.org/)\n- [MuJoCo](https://github.com/deepmind/mujoco)\n- [PettingZoo](https://pettingzoo.farama.org/)\n- [MPE](https://github.com/openai/multiagent-particle-envs)\n- [Chat Bot](https://openrl-docs.readthedocs.io/en/latest/quick_start/train_nlp.html)\n- [Atari](https://gymnasium.farama.org/environments/atari/)\n- [StarCraft II](https://github.com/oxwhirl/smac)\n- [SMACv2](https://github.com/oxwhirl/smacv2)\n- [Omniverse Isaac Gym](https://github.com/NVIDIA-Omniverse/OmniIsaacGymEnvs)\n- [DeepMind Control](https://shimmy.farama.org/environments/dm_control/)\n- [Snake](http://www.jidiai.cn/env_detail?envid=1)\n- [gym-pybullet-drones](https://github.com/utiasDSL/gym-pybullet-drones)\n- [GridWorld](./examples/gridworld/)\n- [Super Mario Bros](https://github.com/Kautenja/gym-super-mario-bros)\n- [Gym Retro](https://github.com/openai/retro)\n\nThis framework has undergone multiple iterations by the [OpenRL-Lab](https://github.com/OpenRL-Lab) team which has\napplied it in academic research.\nIt has now become a mature reinforcement learning framework.\n\nOpenRL-Lab will continue to maintain and update OpenRL, and we welcome everyone to join\nour [open-source community](./CONTRIBUTING.md)\nto contribute towards the development of reinforcement learning.\n\nFor more information about OpenRL, please refer to the [documentation](https://openrl-docs.readthedocs.io/en/latest/).\n\n## Outline\n\n- [Welcome to OpenRL](#welcome-to-openrl)\n- [Outline](#outline)\n- [Why OpenRL?](#why-openrl)\n- [Installation](#installation)\n- [Use Docker](#use-docker)\n- [Quick Start](#quick-start)\n- [Gallery](#gallery)\n- [Projects Using OpenRL](#projects-using-openrl)\n- [Feedback and Contribution](#feedback-and-contribution)\n- [Maintainers](#maintainers)\n- [Supporters](#supporters)\n    - [&#8627; Contributors](#-contributors)\n    - [&#8627; Stargazers](#-stargazers)\n    - [&#8627; Forkers](#-forkers)\n- [Citing OpenRL](#citing-openrl)\n- [License](#license)\n- [Acknowledgments](#acknowledgments)\n\n## Why OpenRL\n\nHere we provide a table for the comparison of OpenRL and existing popular RL libraries.\nOpenRL employs a modular design and high-level abstraction, allowing users to accomplish training for various tasks\nthrough a unified and user-friendly interface.\n\n|                              Library                               |      NLP/RLHF      |     Multi-agent      |  Self-Play Training  |     Offline RL     | Bilingual Document | \n|:------------------------------------------------------------------:|:------------------:|:--------------------:|:--------------------:|:------------------:|:------------------:| \n|         **[OpenRL](https://github.com/OpenRL-Lab/openrl)**         | :heavy_check_mark: |  :heavy_check_mark:  |  :heavy_check_mark:  | :heavy_check_mark: | :heavy_check_mark: |\n|  [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3)  |        :x:         |         :x:          |         :x:          |        :x:         |        :x:         |\n| [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) |        :x:         |  :heavy_check_mark:  |  :heavy_check_mark:  | :heavy_check_mark: |        :x:         |\n|        [DI-engine](https://github.com/opendilab/DI-engine/)        |        :x:         |  :heavy_check_mark:  | not fullly supported | :heavy_check_mark: | :heavy_check_mark: |\n|           [Tianshou](https://github.com/thu-ml/tianshou)           |        :x:         | not fullly supported | not fullly supported | :heavy_check_mark: | :heavy_check_mark: |\n|       [MARLlib](https://github.com/Replicable-MARL/MARLlib)        |        :x:         |  :heavy_check_mark:  | not fullly supported |        :x:         |        :x:         |\n|   [MAPPO Benchmark](https://github.com/marlbenchmark/on-policy)    |        :x:         |  :heavy_check_mark:  |         :x:          |        :x:         |        :x:         |\n|            [RL4LMs](https://github.com/allenai/RL4LMs)             | :heavy_check_mark: |         :x:          |         :x:          |        :x:         |        :x:         |\n|              [trlx](https://github.com/CarperAI/trlx)              | :heavy_check_mark: |         :x:          |         :x:          |        :x:         |        :x:         |\n|             [trl](https://github.com/huggingface/trl)              | :heavy_check_mark: |         :x:          |         :x:          |        :x:         |        :x:         |\n|       [TimeChamber](https://github.com/inspirai/TimeChamber)       |        :x:         |         :x:          |  :heavy_check_mark:  |        :x:         |        :x:         |\n\n## Installation\n\nUsers can directly install OpenRL via pip:\n\n```bash\npip install openrl\n```\n\nIf users are using Anaconda or Miniconda, they can also install OpenRL via conda:\n\n```bash\nconda install -c openrl openrl\n```\n\nUsers who want to modify the source code can also install OpenRL from the source code:\n\n```bash\ngit clone https://github.com/OpenRL-Lab/openrl.git && cd openrl\npip install -e .\n```\n\nAfter installation, users can check the version of OpenRL through command line:\n\n```bash\nopenrl --version\n```\n\n**Tips**: No installation required, try OpenRL online through\nColab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/15VBA-B7AJF8dBazzRcWAxJxZI7Pl9m-g?usp=sharing)\n\n## Use Docker\n\nOpenRL currently provides Docker images with and without GPU support.\nIf the user's computer does not have an NVIDIA GPU, they can obtain an image without the GPU plugin using the following\ncommand:\n\n```bash\nsudo docker pull openrllab/openrl-cpu\n```\n\nIf the user wants to accelerate training with a GPU, they can obtain it using the following command:\n\n```bash\nsudo docker pull openrllab/openrl\n```\n\nAfter successfully pulling the image, users can run OpenRL's Docker image using the following commands:\n\n```bash\n# Without GPU acceleration\nsudo docker run -it openrllab/openrl-cpu\n# With GPU acceleration \nsudo docker run -it --gpus all --net host openrllab/openrl\n```\n\nOnce inside the Docker container, users can check OpenRL's version and then run test cases using these commands:\n\n```bash \n# Check OpenRL version in Docker container  \nopenrl --version  \n# Run test case  \nopenrl --mode train --env CartPole-v1  \n```\n\n## Quick Start\n\nOpenRL provides a simple and easy-to-use interface for beginners in reinforcement learning.\nBelow is an example of using the PPO algorithm to train the `CartPole` environment:\n\n```python\n# train_ppo.py\nfrom openrl.envs.common import make\nfrom openrl.modules.common import PPONet as Net\nfrom openrl.runners.common import PPOAgent as Agent\n\nenv = make(\"CartPole-v1\", env_num=9)  # Create an environment and set the environment parallelism to 9.\nnet = Net(env)  # Create neural network.\nagent = Agent(net)  # Initialize the agent.\nagent.train(\n    total_time_steps=20000)  # Start training and set the total number of steps to 20,000 for the running environment.\n```\n\nTraining an agent using OpenRL only requires four simple steps:\n**Create Environment** => **Initialize Model** => **Initialize Agent** => **Start Training**!\n\nFor a well-trained agent, users can also easily test the agent:\n\n```python\n# train_ppo.py\nfrom openrl.envs.common import make\nfrom openrl.modules.common import PPONet as Net\nfrom openrl.runners.common import PPOAgent as Agent\n\nagent = Agent(Net(make(\"CartPole-v1\", env_num=9)))  # Initialize trainer.\nagent.train(total_time_steps=20000)\n# Create an environment for test, set the parallelism of the environment to 9, and set the rendering mode to group_human.\nenv = make(\"CartPole-v1\", env_num=9, render_mode=\"group_human\")\nagent.set_env(env)  # The agent requires an interactive environment.\nobs, info = env.reset()  # Initialize the environment to obtain initial observations and environmental information.\nwhile True:\n    action, _ = agent.act(obs)  # The agent predicts the next action based on environmental observations.\n    # The environment takes one step according to the action, obtains the next observation, reward, whether it ends and environmental information.\n    obs, r, done, info = env.step(action)\n    if any(done): break\nenv.close()  # Close test environment\n```\n\nExecuting the above code on a regular laptop only takes **a few seconds**\nto complete the training. Below shows the visualization of the agent:\n\n<div align=\"center\">\n  <img src=\"docs/images/train_ppo_cartpole.gif\"></a>\n</div>\n\n\n**Tips:** Users can also quickly train the `CartPole` environment by executing a command line in the terminal.\n\n```bash\nopenrl --mode train --env CartPole-v1\n```\n\nFor training tasks such as multi-agent and natural language processing, OpenRL also provides a similarly simple and\neasy-to-use interface.\n\nFor information on how to perform multi-agent training, set hyperparameters for training, load training configurations,\nuse wandb, save GIF animations, etc., please refer to:\n\n- [Multi-Agent Training Example](https://openrl-docs.readthedocs.io/en/latest/quick_start/multi_agent_RL.html)\n\nFor information on natural language task training, loading models/datasets on Hugging Face, customizing training\nmodels/reward models, etc., please refer to:\n\n- [Dialogue Task Training Example](https://openrl-docs.readthedocs.io/en/latest/quick_start/train_nlp.html)\n\nFor more information about OpenRL, please refer to the [documentation](https://openrl-docs.readthedocs.io/en/latest/).\n\n## Gallery\n\nIn order to facilitate users' familiarity with the framework, we provide more examples and demos of using OpenRL\nin [Gallery](./Gallery.md).\nUsers are also welcome to contribute their own training examples and demos to the Gallery.\n\n## Projects Using OpenRL\n\nWe have listed research projects that use OpenRL in the [OpenRL Project](./Project.md).\nIf you are using OpenRL in your research project, you are also welcome to join this list.\n\n## Feedback and Contribution\n\n- If you have any questions or find bugs, you can check or ask in\n  the [Issues](https://github.com/OpenRL-Lab/openrl/issues).\n- Join the QQ group: [OpenRL Official Communication Group](docs/images/qq.png)\n\n<div align=\"center\">\n<a href=\"docs/images/qq.png\"><img width=\"250px\" height=\"auto\" src=\"docs/images/qq.png\"></a>\n</div>\n\n- Join the [slack](https://join.slack.com/t/openrlhq/shared_invite/zt-1tqwpvthd-Eeh0IxQ~DIaGqYXoW2IUQg) group to discuss\n  OpenRL usage and development with us.\n- Join the [Discord](https://discord.gg/guvAS2up) group to discuss OpenRL usage and development with us.\n- Send an E-mail to: [huangshiyu@4paradigm.com](huangshiyu@4paradigm.com)\n- Join the [GitHub Discussion](https://github.com/orgs/OpenRL-Lab/discussions).\n\nThe OpenRL framework is still under continuous development and documentation.\nWe welcome you to join us in making this project better:\n\n- How to contribute code: Read the [Contributors' Guide](./CONTRIBUTING.md)\n- [OpenRL Roadmap](https://github.com/OpenRL-Lab/openrl/issues/2)\n\n## Maintainers\n\nAt present, OpenRL is maintained by the following maintainers:\n\n- [Shiyu Huang](https://huangshiyu13.github.io/)([@huangshiyu13](https://github.com/huangshiyu13))\n- Wenze Chen([@Chen001117](https://github.com/Chen001117))\n- Yiwen Sun([@YiwenAI](https://github.com/YiwenAI))\n\nWelcome more contributors to join our maintenance team (send an E-mail\nto [huangshiyu@4paradigm.com](huangshiyu@4paradigm.com)\nto apply for joining the OpenRL team).\n\n## Supporters\n\n### &#8627; Contributors\n\n<a href=\"https://github.com/OpenRL-Lab/openrl/graphs/contributors\">\n  <img src=\"https://contrib.rocks/image?repo=OpenRL-Lab/openrl\" />\n</a>\n\n### &#8627; Stargazers\n\n[![Stargazers repo roster for @OpenRL-Lab/openrl](https://reporoster.com/stars/OpenRL-Lab/openrl)](https://github.com/OpenRL-Lab/openrl/stargazers)\n\n### &#8627; Forkers\n\n[![Forkers repo roster for @OpenRL-Lab/openrl](https://reporoster.com/forks/OpenRL-Lab/openrl)](https://github.com/OpenRL-Lab/openrl/network/members)\n\n## Citing OpenRL\n\nIf our work has been helpful to you, please feel free to cite us:\n\n```latex\n@misc{openrl2023,\n    title={OpenRL},\n    author={OpenRL Contributors},\n    publisher = {GitHub},\n    howpublished = {\\url{https://github.com/OpenRL-Lab/openrl}},\n    year={2023},\n}\n```\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=OpenRL-Lab/openrl&type=Date)](https://star-history.com/#OpenRL-Lab/openrl&Date)\n\n## License\n\nOpenRL under the Apache 2.0 license.\n\n## Acknowledgments\n\nThe development of the OpenRL framework has drawn on the strengths of other reinforcement learning frameworks:\n\n- Stable-baselines3: https://github.com/DLR-RM/stable-baselines3\n- pytorch-a2c-ppo-acktr-gail: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail\n- MAPPO: https://github.com/marlbenchmark/on-policy\n- Gymnasium: https://github.com/Farama-Foundation/Gymnasium\n- DI-engine: https://github.com/opendilab/DI-engine/\n- Tianshou: https://github.com/thu-ml/tianshou\n- RL4LMs: https://github.com/allenai/RL4LMs\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "unified reinforcement learning framework",
    "version": "0.1.9",
    "project_urls": {
        "Code": "https://github.com/OpenRL-Lab/openrl",
        "Documentation": "https://openrl-docs.readthedocs.io/zh/latest/",
        "Homepage": "https://github.com/OpenRL-Lab/openrl"
    },
    "split_keywords": [
        "reinforcement-learning",
        "multi-agent",
        "reinforcement-learning-algorithms",
        "pytorch",
        "machine-learning",
        "baselines",
        "toolbox",
        "python",
        "data-science",
        "gym",
        "gymnasium"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "431ddf50744d4c113220a00f921e5b38c2309ba5b96c077506d23046d5834968",
                "md5": "25c35e6e3bb6c7013fbe4d3898249310",
                "sha256": "a9d87242de05bd6bf4a0871ab58e892c43c299ac00596ee589aa8d88cf559bdf"
            },
            "downloads": -1,
            "filename": "openrl-0.1.9-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "25c35e6e3bb6c7013fbe4d3898249310",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 357354,
            "upload_time": "2023-10-27T03:41:43",
            "upload_time_iso_8601": "2023-10-27T03:41:43.516293Z",
            "url": "https://files.pythonhosted.org/packages/43/1d/df50744d4c113220a00f921e5b38c2309ba5b96c077506d23046d5834968/openrl-0.1.9-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "58bc592a4423c3c4d1ca2e91eae751f54a5981c98c85da739a61f495cfac36b7",
                "md5": "083863109e42cb922d2712cf495837ab",
                "sha256": "6b1f9098a8c8e5d8781deb7115db509b1072a33012c7d9ad2b24a32ceae62950"
            },
            "downloads": -1,
            "filename": "openrl-0.1.9.tar.gz",
            "has_sig": false,
            "md5_digest": "083863109e42cb922d2712cf495837ab",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 204947,
            "upload_time": "2023-10-27T03:41:46",
            "upload_time_iso_8601": "2023-10-27T03:41:46.674200Z",
            "url": "https://files.pythonhosted.org/packages/58/bc/592a4423c3c4d1ca2e91eae751f54a5981c98c85da739a61f495cfac36b7/openrl-0.1.9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-27 03:41:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "OpenRL-Lab",
    "github_project": "openrl",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "openrl"
}

openrl contributors