# Generative Reinforcement Learning (GRL)
[](https://twitter.com/opendilab)
[](https://github.com/opendilab/GenerativeRL/stargazers)
[](https://github.com/opendilab/GenerativeRL/network)

[](https://github.com/opendilab/GenerativeRL/issues)
[](https://github.com/opendilab/GenerativeRL/pulls)
[](https://github.com/opendilab/GenerativeRL/graphs/contributors)
[](https://opensource.org/licenses/Apache-2.0)
English | [简体中文(Simplified Chinese)](https://github.com/opendilab/GenerativeRL/blob/main/README.zh.md)
**GenerativeRL**, short for Generative Reinforcement Learning, is a Python library for solving reinforcement learning (RL) problems using generative models, such as diffusion models and flow models. This library aims to provide a framework for combining the power of generative models with the decision-making capabilities of reinforcement learning algorithms.
## Outline
- [Features](#features)
- [Framework Structure](#framework-structure)
- [Integrated Generative Models](#integrated-generative-models)
- [Integrated Algorithms](#integrated-algorithms)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Documentation](#documentation)
- [Tutorials](#tutorials)
- [Benchmark experiments](#benchmark-experiments)
## Features
- Support for training, evaluation and deploying diverse generative models, including diffusion models and flow models
- Integration of generative models for state representation, action representation, policy learning and dynamic model learning in RL
- Implementation of popular RL algorithms tailored for generative models, such as Q-guided policy optimization (QGPO)
- Support for various RL environments and benchmarks
- Easy-to-use API for training and evaluation
## Framework Structure
<p align="center">
<img src="assets/framework.png" alt="Image Description 1" width="80%" height="auto" style="margin: 0 1%;">
</p>
## Integrated Generative Models
| | [Score Matching](https://ieeexplore.ieee.org/document/6795935) | [Flow Matching](https://arxiv.org/abs/2210.02747) |
|-------------------------------------------------------------------------------------| -------------------------------------------------------------- | ------------------------------------------------- |
| **Diffusion Model** [](https://colab.research.google.com/drive/18yHUAmcMh_7xq2U6TBCtcLKX2y4YvNyk) | | |
| [Linear VP SDE](https://arxiv.org/abs/2011.13456) | ✔ | ✔ |
| [Generalized VP SDE](https://arxiv.org/abs/2209.15571) | ✔ | ✔ |
| [Linear SDE](https://arxiv.org/abs/2206.00364) | ✔ | ✔ |
| **Flow Model** [](https://colab.research.google.com/drive/1vrxREVXKsSbnsv9G2CnKPVvrbFZleElI) | | |
| [Independent Conditional Flow Matching](https://arxiv.org/abs/2302.00482) | 🚫 | ✔ |
| [Optimal Transport Conditional Flow Matching](https://arxiv.org/abs/2302.00482) | 🚫 | ✔ |
## Integrated Algorithms
| Algo./Models | Diffusion Model | Flow Model |
|---------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------- |
| [QGPO](https://arxiv.org/abs/2304.12824) | ✔ | 🚫 |
| [SRPO](https://arxiv.org/abs/2310.07297) | ✔ | 🚫 |
| GMPO | ✔ [](https://colab.research.google.com/drive/1A79ueOdLvTfrytjOPyfxb6zSKXi1aePv) | ✔ |
| GMPG | ✔ [](https://colab.research.google.com/drive/1hhMvQsrV-mruvpSCpmnsOxmCb6bMPOBq) | ✔ |
## Installation
```bash
pip install grl
```
Or, if you want to install from source:
```bash
git clone https://github.com/opendilab/GenerativeRL.git
cd GenerativeRL
pip install -e .
```
Or you can use the docker image:
```bash
docker pull opendilab/grl:torch2.3.0-cuda12.1-cudnn8-runtime
docker run -it --rm --gpus all opendilab/grl:torch2.3.0-cuda12.1-cudnn8-runtime /bin/bash
```
## Quick Start
Here is an example of how to train a diffusion model for Q-guided policy optimization (QGPO) in the [LunarLanderContinuous-v2](https://www.gymlibrary.dev/environments/box2d/lunar_lander/) environment using GenerativeRL.
Install the required dependencies:
```bash
pip install 'gym[box2d]==0.23.1'
```
(The gym version can be from 0.23 to 0.25 for box2d environments, but it is recommended to use 0.23.1 for compatibility with D4RL.)
Download dataset from [here](https://drive.google.com/file/d/1YnT-Oeu9LPKuS_ZqNc5kol_pMlJ1DwyG/view?usp=drive_link) and save it as `data.npz` in the current directory.
GenerativeRL uses WandB for logging. It will ask you to log in to your account when you use it. You can disable it by running:
```bash
wandb offline
```
```python
import gym
from grl.algorithms.qgpo import QGPOAlgorithm
from grl.datasets import QGPOCustomizedDataset
from grl.utils.log import log
from grl_pipelines.diffusion_model.configurations.lunarlander_continuous_qgpo import config
def qgpo_pipeline(config):
qgpo = QGPOAlgorithm(config, dataset=QGPOCustomizedDataset(numpy_data_path="./data.npz", action_augment_num=config.train.parameter.action_augment_num))
qgpo.train()
agent = qgpo.deploy()
env = gym.make(config.deploy.env.env_id)
observation = env.reset()
for _ in range(config.deploy.num_deploy_steps):
env.render()
observation, reward, done, _ = env.step(agent.act(observation))
if __name__ == '__main__':
log.info("config: \n{}".format(config))
qgpo_pipeline(config)
```
For more detailed examples and documentation, please refer to the GenerativeRL documentation.
## Documentation
The full documentation for GenerativeRL can be found at [GenerativeRL Documentation](https://opendilab.github.io/GenerativeRL/).
## Tutorials
We provide several case tutorials to help you better understand GenerativeRL. See more at [tutorials](https://github.com/opendilab/GenerativeRL/tree/main/grl_pipelines/tutorials).
## Benchmark experiments
We offer some baseline experiments to evaluate the performance of generative reinforcement learning algorithms. See more at [benchmark](https://github.com/opendilab/GenerativeRL/tree/main/grl_pipelines/benchmark).
## Contributing
We welcome contributions to GenerativeRL! If you are interested in contributing, please refer to the [Contributing Guide](CONTRIBUTING.md).
## License
GenerativeRL is licensed under the Apache License 2.0. See [LICENSE](LICENSE) for more details.
Raw data
{
"_id": null,
"home_page": "https://github.com/opendilab/GenerativeRL",
"name": "GenerativeRL",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "OpenDILab",
"author_email": "opendilab@pjlab.org.cn",
"download_url": "https://files.pythonhosted.org/packages/29/e8/36697d81b4ce99449e0b4d3ddaa9fa5483f4f9f981e9a5d5d039e45b29aa/generativerl-0.0.1.tar.gz",
"platform": null,
"description": "# Generative Reinforcement Learning (GRL)\n\n[](https://twitter.com/opendilab) \n[](https://github.com/opendilab/GenerativeRL/stargazers)\n[](https://github.com/opendilab/GenerativeRL/network)\n\n[](https://github.com/opendilab/GenerativeRL/issues)\n[](https://github.com/opendilab/GenerativeRL/pulls)\n[](https://github.com/opendilab/GenerativeRL/graphs/contributors)\n[](https://opensource.org/licenses/Apache-2.0)\n\nEnglish | [\u7b80\u4f53\u4e2d\u6587(Simplified Chinese)](https://github.com/opendilab/GenerativeRL/blob/main/README.zh.md)\n\n**GenerativeRL**, short for Generative Reinforcement Learning, is a Python library for solving reinforcement learning (RL) problems using generative models, such as diffusion models and flow models. This library aims to provide a framework for combining the power of generative models with the decision-making capabilities of reinforcement learning algorithms.\n\n## Outline\n\n- [Features](#features)\n- [Framework Structure](#framework-structure)\n- [Integrated Generative Models](#integrated-generative-models)\n- [Integrated Algorithms](#integrated-algorithms)\n- [Installation](#installation)\n- [Quick Start](#quick-start)\n- [Documentation](#documentation)\n- [Tutorials](#tutorials)\n- [Benchmark experiments](#benchmark-experiments)\n\n## Features\n\n- Support for training, evaluation and deploying diverse generative models, including diffusion models and flow models\n- Integration of generative models for state representation, action representation, policy learning and dynamic model learning in RL\n- Implementation of popular RL algorithms tailored for generative models, such as Q-guided policy optimization (QGPO)\n- Support for various RL environments and benchmarks\n- Easy-to-use API for training and evaluation\n\n## Framework Structure\n\n<p align=\"center\">\n <img src=\"assets/framework.png\" alt=\"Image Description 1\" width=\"80%\" height=\"auto\" style=\"margin: 0 1%;\">\n</p>\n\n## Integrated Generative Models\n\n| | [Score Matching](https://ieeexplore.ieee.org/document/6795935) | [Flow Matching](https://arxiv.org/abs/2210.02747) |\n|-------------------------------------------------------------------------------------| -------------------------------------------------------------- | ------------------------------------------------- |\n| **Diffusion Model** [](https://colab.research.google.com/drive/18yHUAmcMh_7xq2U6TBCtcLKX2y4YvNyk) | | |\n| [Linear VP SDE](https://arxiv.org/abs/2011.13456) | \u2714 | \u2714 |\n| [Generalized VP SDE](https://arxiv.org/abs/2209.15571) | \u2714 | \u2714 |\n| [Linear SDE](https://arxiv.org/abs/2206.00364) | \u2714 | \u2714 |\n| **Flow Model** [](https://colab.research.google.com/drive/1vrxREVXKsSbnsv9G2CnKPVvrbFZleElI) | | |\n| [Independent Conditional Flow Matching](https://arxiv.org/abs/2302.00482) | \ud83d\udeab | \u2714 |\n| [Optimal Transport Conditional Flow Matching](https://arxiv.org/abs/2302.00482) | \ud83d\udeab | \u2714 |\n\n\n\n## Integrated Algorithms\n\n| Algo./Models | Diffusion Model | Flow Model |\n|---------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------- |\n| [QGPO](https://arxiv.org/abs/2304.12824) | \u2714 | \ud83d\udeab |\n| [SRPO](https://arxiv.org/abs/2310.07297) | \u2714 | \ud83d\udeab |\n| GMPO | \u2714 [](https://colab.research.google.com/drive/1A79ueOdLvTfrytjOPyfxb6zSKXi1aePv) | \u2714 |\n| GMPG | \u2714 [](https://colab.research.google.com/drive/1hhMvQsrV-mruvpSCpmnsOxmCb6bMPOBq) | \u2714 |\n\n\n## Installation\n\n```bash\npip install grl\n```\n\nOr, if you want to install from source:\n\n```bash\ngit clone https://github.com/opendilab/GenerativeRL.git\ncd GenerativeRL\npip install -e .\n```\n\nOr you can use the docker image:\n```bash\ndocker pull opendilab/grl:torch2.3.0-cuda12.1-cudnn8-runtime\ndocker run -it --rm --gpus all opendilab/grl:torch2.3.0-cuda12.1-cudnn8-runtime /bin/bash\n```\n\n## Quick Start\n\nHere is an example of how to train a diffusion model for Q-guided policy optimization (QGPO) in the [LunarLanderContinuous-v2](https://www.gymlibrary.dev/environments/box2d/lunar_lander/) environment using GenerativeRL.\n\nInstall the required dependencies:\n```bash\npip install 'gym[box2d]==0.23.1'\n```\n(The gym version can be from 0.23 to 0.25 for box2d environments, but it is recommended to use 0.23.1 for compatibility with D4RL.)\n\nDownload dataset from [here](https://drive.google.com/file/d/1YnT-Oeu9LPKuS_ZqNc5kol_pMlJ1DwyG/view?usp=drive_link) and save it as `data.npz` in the current directory.\n\nGenerativeRL uses WandB for logging. It will ask you to log in to your account when you use it. You can disable it by running:\n```bash\nwandb offline\n```\n\n```python\nimport gym\n\nfrom grl.algorithms.qgpo import QGPOAlgorithm\nfrom grl.datasets import QGPOCustomizedDataset\nfrom grl.utils.log import log\nfrom grl_pipelines.diffusion_model.configurations.lunarlander_continuous_qgpo import config\n\ndef qgpo_pipeline(config):\n qgpo = QGPOAlgorithm(config, dataset=QGPOCustomizedDataset(numpy_data_path=\"./data.npz\", action_augment_num=config.train.parameter.action_augment_num))\n qgpo.train()\n\n agent = qgpo.deploy()\n env = gym.make(config.deploy.env.env_id)\n observation = env.reset()\n for _ in range(config.deploy.num_deploy_steps):\n env.render()\n observation, reward, done, _ = env.step(agent.act(observation))\n\nif __name__ == '__main__':\n log.info(\"config: \\n{}\".format(config))\n qgpo_pipeline(config)\n```\n\nFor more detailed examples and documentation, please refer to the GenerativeRL documentation.\n\n## Documentation\n\nThe full documentation for GenerativeRL can be found at [GenerativeRL Documentation](https://opendilab.github.io/GenerativeRL/).\n\n## Tutorials\n\nWe provide several case tutorials to help you better understand GenerativeRL. See more at [tutorials](https://github.com/opendilab/GenerativeRL/tree/main/grl_pipelines/tutorials).\n\n## Benchmark experiments\n\nWe offer some baseline experiments to evaluate the performance of generative reinforcement learning algorithms. See more at [benchmark](https://github.com/opendilab/GenerativeRL/tree/main/grl_pipelines/benchmark).\n\n## Contributing\n\nWe welcome contributions to GenerativeRL! If you are interested in contributing, please refer to the [Contributing Guide](CONTRIBUTING.md).\n\n## License\n\nGenerativeRL is licensed under the Apache License 2.0. See [LICENSE](LICENSE) for more details.\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "PyTorch implementations of generative reinforcement learning algorithms",
"version": "0.0.1",
"project_urls": {
"Homepage": "https://github.com/opendilab/GenerativeRL"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "744ff96c36768406c1ad7f69970828ab0a75c21ec1834f7b56fe5f19e9715226",
"md5": "4b3a8ab1593e0ca665c3abb810183802",
"sha256": "f18f3577219075ecfca4d72ab82808e3b405ae8c17a0441e17c2c4f3a38832a0"
},
"downloads": -1,
"filename": "GenerativeRL-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4b3a8ab1593e0ca665c3abb810183802",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 191325,
"upload_time": "2024-10-31T13:24:31",
"upload_time_iso_8601": "2024-10-31T13:24:31.083053Z",
"url": "https://files.pythonhosted.org/packages/74/4f/f96c36768406c1ad7f69970828ab0a75c21ec1834f7b56fe5f19e9715226/GenerativeRL-0.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "29e836697d81b4ce99449e0b4d3ddaa9fa5483f4f9f981e9a5d5d039e45b29aa",
"md5": "40596aa7a7d93594669832cb4bfdb589",
"sha256": "911d75a8cf7481b67f93b67b4e4f6a29b364d4edb85a93bbe5fbdf64300fab0b"
},
"downloads": -1,
"filename": "generativerl-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "40596aa7a7d93594669832cb4bfdb589",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 141125,
"upload_time": "2024-10-31T13:24:33",
"upload_time_iso_8601": "2024-10-31T13:24:33.481197Z",
"url": "https://files.pythonhosted.org/packages/29/e8/36697d81b4ce99449e0b4d3ddaa9fa5483f4f9f981e9a5d5d039e45b29aa/generativerl-0.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-31 13:24:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "opendilab",
"github_project": "GenerativeRL",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "generativerl"
}