<div align="center">
<a href="http://www.offline-saferl.org"><img width="300px" height="auto" src="https://github.com/liuzuxin/osrl/raw/main/docs/_static/images/osrl-logo.png"></a>
</div>
<br/>
<div align="center">
<a>![Python 3.8+](https://img.shields.io/badge/Python-3.8%2B-brightgreen.svg)</a>
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](#license)
[![PyPI](https://img.shields.io/pypi/v/osrl?logo=pypi)](https://pypi.org/project/osrl)
[![GitHub Repo Stars](https://img.shields.io/github/stars/liuzuxin/osrl?color=brightgreen&logo=github)](https://github.com/liuzuxin/osrl/stargazers)
[![Downloads](https://static.pepy.tech/personalized-badge/osrl?period=total&left_color=grey&right_color=blue&left_text=downloads)](https://pepy.tech/project/osrl)
<!-- [![Documentation Status](https://img.shields.io/readthedocs/fsrl?logo=readthedocs)](https://fsrl.readthedocs.io) -->
<!-- [![CodeCov](https://codecov.io/github/liuzuxin/fsrl/branch/main/graph/badge.svg?token=BU27LTW9F3)](https://codecov.io/github/liuzuxin/fsrl)
[![Tests](https://github.com/liuzuxin/fsrl/actions/workflows/test.yml/badge.svg)](https://github.com/liuzuxin/fsrl/actions/workflows/test.yml) -->
<!-- [![CodeCov](https://img.shields.io/codecov/c/github/liuzuxin/fsrl/main?logo=codecov)](https://app.codecov.io/gh/liuzuxin/fsrl) -->
<!-- [![tests](https://img.shields.io/github/actions/workflow/status/liuzuxin/fsrl/test.yml?label=tests&logo=github)](https://github.com/liuzuxin/fsrl/tree/HEAD/tests) -->
</div>
---
**OSRL (Offline Safe Reinforcement Learning)** offers a collection of elegant and extensible implementations of state-of-the-art offline safe reinforcement learning (RL) algorithms. Aimed at propelling research in offline safe RL, OSRL serves as a solid foundation to implement, benchmark, and iterate on safe RL solutions.
The OSRL package is a crucial component of our larger benchmarking suite for offline safe learning, which also includes [DSRL](https://github.com/liuzuxin/DSRL) and [FSRL](https://github.com/liuzuxin/FSRL), and is built to facilitate the development of robust and reliable offline safe RL solutions.
To learn more, please visit our [project website](http://www.offline-saferl.org).
## Structure
The structure of this repo is as follows:
```
├── examples
│ ├── configs # the training configs of each algorithm
│ ├── eval # the evaluation escipts
│ ├── train # the training scipts
├── osrl
│ ├── algorithms # offline safe RL algorithms
│ ├── common # base networks and utils
```
The implemented offline safe RL and imitation learning algorithms include:
| Algorithm | Type | Description |
|:-------------------:|:-----------------:|:------------------------:|
| BCQ-Lag | Q-learning | [BCQ](https://arxiv.org/pdf/1812.02900.pdf) with [PID Lagrangian](https://arxiv.org/abs/2007.03964) |
| BEAR-Lag | Q-learning | [BEARL](https://arxiv.org/abs/1906.00949) with [PID Lagrangian](https://arxiv.org/abs/2007.03964) |
| CPQ | Q-learning | [Constraints Penalized Q-learning (CPQ))](https://arxiv.org/abs/2107.09003) |
| COptiDICE | Distribution Correction Estimation | [Offline Constrained Policy Optimization via stationary DIstribution Correction Estimation](https://arxiv.org/abs/2204.08957) |
| CDT | Sequential Modeling | [Constrained Decision Transformer](https://arxiv.org/abs/2302.07351) |
| BC-All | Imitation Learning | [Behavior Cloning](https://arxiv.org/abs/2302.07351) with all datasets |
| BC-Safe | Imitation Learning | [Behavior Cloning](https://arxiv.org/abs/2302.07351) with safe trajectories |
| BC-Frontier | Imitation Learning | [Behavior Cloning](https://arxiv.org/abs/2302.07351) with high-reward trajectories |
## Installation
Pull the repo and install:
```
git clone https://github.com/liuzuxin/OSRL.git
cd osrl
pip install -e .
pip install OApackage==2.7.6
```
## How to use OSRL
The example usage are in the `examples` folder, where you can find the training and evaluation scripts for all the algorithms.
All the parameters and their default configs for each algorithm are available in the `examples/configs` folder.
OSRL uses the `WandbLogger` in [FSRL](https://github.com/liuzuxin/FSRL). The offline dataset and offline environments are provided in [DSRL](https://github.com/liuzuxin/DSRL), so make sure you install both of them first.
### Training
For example, to train the `bcql` method, simply run by overriding the default parameters:
```shell
python examples/train/train_bcql.py --task OfflineCarCirvle-v0 --param1 args1 ...
```
By default, the config file and the logs during training will be written to `logs\` folder and the training plots can be viewed online using Wandb.
You can also launch a sequence of experiments or in parallel via the [EasyRunner](https://github.com/liuzuxin/easy-runner) package, see `examples/train_all_tasks.py` for details.
### Evaluation
To evaluate a trained agent, for example, a BCQ agent, simply run
```
python example/eval/eval_bcql.py --path path_to_model --eval_episodes 20
```
It will load config file from `path_to_model/config.yaml` and model file from `path_to_model/checkpoints/model.pt`, run 20 episodes, and print the average normalized reward and cost.
Raw data
{
"_id": null,
"home_page": "https://github.com/liuzuxin/offline-safe-rl-baselines.git",
"name": "osrl-lib",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "offline safe reinforcement learning algorithms pytorch",
"author": "Zijian Guo; Zuxin Liu",
"author_email": "zuxin1997@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/62/e1/8943331e24b5f5060e47f404d431268c665a68b9005400fb98b40ab2605b/osrl-lib-0.1.0.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n <a href=\"http://www.offline-saferl.org\"><img width=\"300px\" height=\"auto\" src=\"https://github.com/liuzuxin/osrl/raw/main/docs/_static/images/osrl-logo.png\"></a>\n</div>\n\n<br/>\n\n<div align=\"center\">\n\n <a>![Python 3.8+](https://img.shields.io/badge/Python-3.8%2B-brightgreen.svg)</a>\n [![License](https://img.shields.io/badge/License-MIT-blue.svg)](#license)\n [![PyPI](https://img.shields.io/pypi/v/osrl?logo=pypi)](https://pypi.org/project/osrl)\n [![GitHub Repo Stars](https://img.shields.io/github/stars/liuzuxin/osrl?color=brightgreen&logo=github)](https://github.com/liuzuxin/osrl/stargazers)\n [![Downloads](https://static.pepy.tech/personalized-badge/osrl?period=total&left_color=grey&right_color=blue&left_text=downloads)](https://pepy.tech/project/osrl)\n <!-- [![Documentation Status](https://img.shields.io/readthedocs/fsrl?logo=readthedocs)](https://fsrl.readthedocs.io) -->\n <!-- [![CodeCov](https://codecov.io/github/liuzuxin/fsrl/branch/main/graph/badge.svg?token=BU27LTW9F3)](https://codecov.io/github/liuzuxin/fsrl)\n [![Tests](https://github.com/liuzuxin/fsrl/actions/workflows/test.yml/badge.svg)](https://github.com/liuzuxin/fsrl/actions/workflows/test.yml) -->\n <!-- [![CodeCov](https://img.shields.io/codecov/c/github/liuzuxin/fsrl/main?logo=codecov)](https://app.codecov.io/gh/liuzuxin/fsrl) -->\n <!-- [![tests](https://img.shields.io/github/actions/workflow/status/liuzuxin/fsrl/test.yml?label=tests&logo=github)](https://github.com/liuzuxin/fsrl/tree/HEAD/tests) -->\n\n</div>\n\n---\n\n**OSRL (Offline Safe Reinforcement Learning)** offers a collection of elegant and extensible implementations of state-of-the-art offline safe reinforcement learning (RL) algorithms. Aimed at propelling research in offline safe RL, OSRL serves as a solid foundation to implement, benchmark, and iterate on safe RL solutions.\n\nThe OSRL package is a crucial component of our larger benchmarking suite for offline safe learning, which also includes [DSRL](https://github.com/liuzuxin/DSRL) and [FSRL](https://github.com/liuzuxin/FSRL), and is built to facilitate the development of robust and reliable offline safe RL solutions.\n\nTo learn more, please visit our [project website](http://www.offline-saferl.org).\n\n## Structure\nThe structure of this repo is as follows:\n```\n\u251c\u2500\u2500 examples\n\u2502 \u251c\u2500\u2500 configs # the training configs of each algorithm\n\u2502 \u251c\u2500\u2500 eval # the evaluation escipts\n\u2502 \u251c\u2500\u2500 train # the training scipts\n\u251c\u2500\u2500 osrl\n\u2502 \u251c\u2500\u2500 algorithms # offline safe RL algorithms\n\u2502 \u251c\u2500\u2500 common # base networks and utils\n```\nThe implemented offline safe RL and imitation learning algorithms include:\n\n| Algorithm | Type | Description |\n|:-------------------:|:-----------------:|:------------------------:|\n| BCQ-Lag | Q-learning | [BCQ](https://arxiv.org/pdf/1812.02900.pdf) with [PID Lagrangian](https://arxiv.org/abs/2007.03964) |\n| BEAR-Lag | Q-learning | [BEARL](https://arxiv.org/abs/1906.00949) with [PID Lagrangian](https://arxiv.org/abs/2007.03964) |\n| CPQ | Q-learning | [Constraints Penalized Q-learning (CPQ))](https://arxiv.org/abs/2107.09003) |\n| COptiDICE | Distribution Correction Estimation | [Offline Constrained Policy Optimization via stationary DIstribution Correction Estimation](https://arxiv.org/abs/2204.08957) |\n| CDT | Sequential Modeling | [Constrained Decision Transformer](https://arxiv.org/abs/2302.07351) |\n| BC-All | Imitation Learning | [Behavior Cloning](https://arxiv.org/abs/2302.07351) with all datasets |\n| BC-Safe | Imitation Learning | [Behavior Cloning](https://arxiv.org/abs/2302.07351) with safe trajectories |\n| BC-Frontier | Imitation Learning | [Behavior Cloning](https://arxiv.org/abs/2302.07351) with high-reward trajectories |\n\n\n## Installation\nPull the repo and install:\n```\ngit clone https://github.com/liuzuxin/OSRL.git\ncd osrl\npip install -e .\npip install OApackage==2.7.6\n```\n\n## How to use OSRL\n\nThe example usage are in the `examples` folder, where you can find the training and evaluation scripts for all the algorithms. \nAll the parameters and their default configs for each algorithm are available in the `examples/configs` folder. \nOSRL uses the `WandbLogger` in [FSRL](https://github.com/liuzuxin/FSRL). The offline dataset and offline environments are provided in [DSRL](https://github.com/liuzuxin/DSRL), so make sure you install both of them first.\n\n### Training\nFor example, to train the `bcql` method, simply run by overriding the default parameters:\n\n```shell\npython examples/train/train_bcql.py --task OfflineCarCirvle-v0 --param1 args1 ...\n```\nBy default, the config file and the logs during training will be written to `logs\\` folder and the training plots can be viewed online using Wandb.\n\nYou can also launch a sequence of experiments or in parallel via the [EasyRunner](https://github.com/liuzuxin/easy-runner) package, see `examples/train_all_tasks.py` for details.\n\n### Evaluation\nTo evaluate a trained agent, for example, a BCQ agent, simply run\n```\npython example/eval/eval_bcql.py --path path_to_model --eval_episodes 20\n```\nIt will load config file from `path_to_model/config.yaml` and model file from `path_to_model/checkpoints/model.pt`, run 20 episodes, and print the average normalized reward and cost.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Elegant Implementations of Offline Safe Reinforcement Learning Algorithms",
"version": "0.1.0",
"project_urls": {
"Homepage": "https://github.com/liuzuxin/offline-safe-rl-baselines.git"
},
"split_keywords": [
"offline",
"safe",
"reinforcement",
"learning",
"algorithms",
"pytorch"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "54b70f3b240e6a5805ff0d5730d8c1b3d27944dba4e55ed08f378db630fbcb95",
"md5": "ea30b18e4f87002c95c6163395183f43",
"sha256": "724925792490fff923c23e52061b52d82d590ba06225c8c9a323a813b0e40cfc"
},
"downloads": -1,
"filename": "osrl_lib-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ea30b18e4f87002c95c6163395183f43",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 43135,
"upload_time": "2023-06-15T03:16:42",
"upload_time_iso_8601": "2023-06-15T03:16:42.848642Z",
"url": "https://files.pythonhosted.org/packages/54/b7/0f3b240e6a5805ff0d5730d8c1b3d27944dba4e55ed08f378db630fbcb95/osrl_lib-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "62e18943331e24b5f5060e47f404d431268c665a68b9005400fb98b40ab2605b",
"md5": "a652bdc9efde487a667682ef8b0a5b57",
"sha256": "238e248763f7fb9176c8a35cf3bd4774f3bc2eb1752867630a0008a0780784c1"
},
"downloads": -1,
"filename": "osrl-lib-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "a652bdc9efde487a667682ef8b0a5b57",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 36201,
"upload_time": "2023-06-15T03:16:45",
"upload_time_iso_8601": "2023-06-15T03:16:45.794600Z",
"url": "https://files.pythonhosted.org/packages/62/e1/8943331e24b5f5060e47f404d431268c665a68b9005400fb98b40ab2605b/osrl-lib-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-15 03:16:45",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "liuzuxin",
"github_project": "offline-safe-rl-baselines",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "osrl-lib"
}