# RL Toolkit
[![Release](https://img.shields.io/github/release/markub3327/rl-toolkit)](https://github.com/markub3327/rl-toolkit/releases)
![Tag](https://img.shields.io/github/v/tag/markub3327/rl-toolkit)
[![Issues](https://img.shields.io/github/issues/markub3327/rl-toolkit)](https://github.com/markub3327/rl-toolkit/issues)
![Commits](https://img.shields.io/github/commit-activity/w/markub3327/rl-toolkit)
![Languages](https://img.shields.io/github/languages/count/markub3327/rl-toolkit)
![Size](https://img.shields.io/github/repo-size/markub3327/rl-toolkit)
## Papers
* [**Playing Flappy Bird Based on Motion Recognition Using a Transformer Model and LIDAR Sensor**](https://www.mdpi.com/1424-8220/24/6/1905)
* [**Soft Actor-Critic**](https://arxiv.org/abs/1812.05905)
* [**Generalized State-Dependent Exploration**](https://arxiv.org/abs/2005.05719)
* [**Reverb: A framework for experience replay**](https://arxiv.org/abs/2102.04736)
* [**Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics**](https://arxiv.org/abs/2005.04269)
* [**Acme: A Research Framework for Distributed Reinforcement Learning**](https://arxiv.org/abs/2006.00979)
* [**Dueling Network Architectures for Deep Reinforcement Learning**](https://arxiv.org/abs/1511.06581)
* [**Attention Is All You Need**](https://arxiv.org/abs/1706.03762)
* [**An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale**](https://arxiv.org/abs/2010.11929)
## Installation with PyPI
### On PC AMD64 with Ubuntu/Debian
1. Install dependences
```sh
apt update -y
apt install swig -y
```
2. Install RL-Toolkit
```sh
pip3 install rl-toolkit[all]
```
3. Run (for **Server**)
```sh
rl_toolkit rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 server
```
Run (for **Agent**)
```sh
rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 agent
```
Run (for **Learner**)
```sh
rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 learner --db_server 192.168.1.2
```
Run (for **Tester**)
```sh
rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 tester -f save/model/actor.h5
```
### On NVIDIA Jetson
1. Install dependences
<br>Tensorflow for JetPack, follow instructions [here](https://docs.nvidia.com/deeplearning/frameworks/install-tf-jetson-platform/index.html) for installation.
```sh
sudo apt install swig -y
```
2. Install Reverb
<br>Download Bazel 3.7.2 for arm64, [here](https://github.com/bazelbuild/bazel)
```sh
mkdir ~/bin
mv ~/Downloads/bazel-3.7.2-linux-arm64 ~/bin/bazel
chmod +x ~/bin/bazel
export PATH=$PATH:~/bin
```
Clone Reverb with version that corespond with TF verion installed on NVIDIA Jetson !
```sh
git clone https://github.com/deepmind/reverb
cd reverb/
git checkout r0.9.0
```
Make changes in Reverb before building !
<br>In .bazelrc
```bazel
- build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain
+ # build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain
- build --copt=-mavx --copt=-DEIGEN_MAX_ALIGN_BYTES=64
+ build --copt=-DEIGEN_MAX_ALIGN_BYTES=64
```
In WORKSPACE
```bazel
- PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55"
+ # PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55"
+ PROTOC_SHA256 = "7877fee5793c3aafd704e290230de9348d24e8612036f1d784c8863bc790082e"
```
In oss_build.sh
```bazel
- bazel test -c opt --copt=-mavx --config=manylinux2010 --test_output=errors //reverb/cc/...
+ bazel test -c opt --copt="-march=armv8-a+crypto" --test_output=errors //reverb/cc/...
# Builds Reverb and creates the wheel package.
- bazel build -c opt --copt=-mavx $EXTRA_OPT --config=manylinux2010 reverb/pip_package:build_pip_package
+ bazel build -c opt --copt="-march=armv8-a+crypto" $EXTRA_OPT reverb/pip_package:build_pip_package
```
In reverb/cc/platform/default/repo.bzl
```bazel
urls = [
- "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-x86_64.zip" % (version, version),
+ "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-aarch_64.zip" % (version, version),
]
```
In reverb/pip_package/build_pip_package.sh
```sh
- "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} --plat manylinux2010_x86_64 > /dev/null
+ "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} > /dev/null
```
Build and install
```sh
bash oss_build.sh --clean true --tf_dep_override "tensorflow~=2.9.1" --release --python "3.8"
bash ./bazel-bin/reverb/pip_package/build_pip_package --dst /tmp/reverb/dist/ --release
pip3 install /tmp/reverb/dist/dm_reverb-*
```
Cleaning
```sh
cd ../
rm -R reverb/
```
3. Install RL-Toolkit
```sh
pip3 install rl-toolkit
```
## Environments
| Environment | Observation space | Observation bounds | Action space | Action bounds | Reward bounds |
| ------------------------ |:-----------------:|:--------------------:|:------------:|:------------------:| :-----------: |
| BipedalWalkerHardcore-v3 | (24, ) | [-inf, inf] | (4, ) | [-1.0, 1.0] | [-1.0, 1.0] |
| FlappyBird-v0 | (16, 180) | [0, d<sub>max</sub>] | (2, ) | {DO NOTHING, FLAP} | [-1.0, 1.0] |
## Results
| Environment | SAC<br> + gSDE | SAC<br> + gSDE<br>+ Huber loss | SAC<br> + TQC<br> + gSDE | Q-Learning | RL-Toolkit |
| ------------------------ |:----------------------------------------------------------------------------------------------:|:------------------------------:|:-----------------------------------------------------------------------------------------------:|:----------:|:----------:|
| BipedalWalkerHardcore-v3 | 13 ± 18[<sup>(1)</sup>](https://sb3-contrib.readthedocs.io/en/stable/modules/tqc.html#results) | **239 ± 118** | 228 ± 18[<sup>(1)</sup>](https://sb3-contrib.readthedocs.io/en/stable/modules/tqc.html#results) | - | 205 ± 134 |
| FlappyBird-v0 | - |-| - | 209.298[<sup>(2)</sup>](https://arxiv.org/pdf/2003.09579) | 13 156 |
![dm_ant_ball_sac](https://raw.githubusercontent.com/markub3327/rl-toolkit/master/img/dm_ant_ball_sac.gif)
## Releases
* SAC + gSDE + Huber loss<br>   is stored here, [branch r2.0](https://github.com/markub3327/rl-toolkit/tree/r2.0)
* SAC + TQC + gSDE + LogCosh + Reverb<br>   is stored here, [branch r4.0](https://github.com/markub3327/rl-toolkit/tree/r4.1)
* DQN + SAC agents [branch r4.0](https://github.com/markub3327/rl-toolkit/)
----------------------------------
**Frameworks:** Tensorflow, DeepMind Reverb, Gymnasium, DeepMind Control Suite, WanDB, OpenCV
# RL Toolkit
[![Release](https://img.shields.io/github/release/markub3327/rl-toolkit)](https://github.com/markub3327/rl-toolkit/releases)
![Tag](https://img.shields.io/github/v/tag/markub3327/rl-toolkit)
[![Issues](https://img.shields.io/github/issues/markub3327/rl-toolkit)](https://github.com/markub3327/rl-toolkit/issues)
![Commits](https://img.shields.io/github/commit-activity/w/markub3327/rl-toolkit)
![Languages](https://img.shields.io/github/languages/count/markub3327/rl-toolkit)
![Size](https://img.shields.io/github/repo-size/markub3327/rl-toolkit)
## Papers
* [**Playing Flappy Bird Based on Motion Recognition Using a Transformer Model and LIDAR Sensor**](https://www.mdpi.com/1424-8220/24/6/1905)
* [**Soft Actor-Critic**](https://arxiv.org/abs/1812.05905)
* [**Generalized State-Dependent Exploration**](https://arxiv.org/abs/2005.05719)
* [**Reverb: A framework for experience replay**](https://arxiv.org/abs/2102.04736)
* [**Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics**](https://arxiv.org/abs/2005.04269)
* [**Acme: A Research Framework for Distributed Reinforcement Learning**](https://arxiv.org/abs/2006.00979)
* [**Dueling Network Architectures for Deep Reinforcement Learning**](https://arxiv.org/abs/1511.06581)
* [**Attention Is All You Need**](https://arxiv.org/abs/1706.03762)
* [**An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale**](https://arxiv.org/abs/2010.11929)
## Installation with PyPI
### On PC AMD64 with Ubuntu/Debian
1. Install dependences
```sh
apt update -y
apt install swig -y
```
2. Install RL-Toolkit
```sh
pip3 install rl-toolkit[all]
```
3. Run (for **Server**)
```sh
rl_toolkit rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 server
```
Run (for **Agent**)
```sh
rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 agent
```
Run (for **Learner**)
```sh
rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 learner --db_server 192.168.1.2
```
Run (for **Tester**)
```sh
rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 tester -f save/model/actor.h5
```
### On NVIDIA Jetson
1. Install dependences
<br>Tensorflow for JetPack, follow instructions [here](https://docs.nvidia.com/deeplearning/frameworks/install-tf-jetson-platform/index.html) for installation.
```sh
sudo apt install swig -y
```
2. Install Reverb
<br>Download Bazel 3.7.2 for arm64, [here](https://github.com/bazelbuild/bazel)
```sh
mkdir ~/bin
mv ~/Downloads/bazel-3.7.2-linux-arm64 ~/bin/bazel
chmod +x ~/bin/bazel
export PATH=$PATH:~/bin
```
Clone Reverb with version that corespond with TF verion installed on NVIDIA Jetson !
```sh
git clone https://github.com/deepmind/reverb
cd reverb/
git checkout r0.9.0
```
Make changes in Reverb before building !
<br>In .bazelrc
```bazel
- build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain
+ # build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain
- build --copt=-mavx --copt=-DEIGEN_MAX_ALIGN_BYTES=64
+ build --copt=-DEIGEN_MAX_ALIGN_BYTES=64
```
In WORKSPACE
```bazel
- PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55"
+ # PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55"
+ PROTOC_SHA256 = "7877fee5793c3aafd704e290230de9348d24e8612036f1d784c8863bc790082e"
```
In oss_build.sh
```bazel
- bazel test -c opt --copt=-mavx --config=manylinux2010 --test_output=errors //reverb/cc/...
+ bazel test -c opt --copt="-march=armv8-a+crypto" --test_output=errors //reverb/cc/...
# Builds Reverb and creates the wheel package.
- bazel build -c opt --copt=-mavx $EXTRA_OPT --config=manylinux2010 reverb/pip_package:build_pip_package
+ bazel build -c opt --copt="-march=armv8-a+crypto" $EXTRA_OPT reverb/pip_package:build_pip_package
```
In reverb/cc/platform/default/repo.bzl
```bazel
urls = [
- "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-x86_64.zip" % (version, version),
+ "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-aarch_64.zip" % (version, version),
]
```
In reverb/pip_package/build_pip_package.sh
```sh
- "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} --plat manylinux2010_x86_64 > /dev/null
+ "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} > /dev/null
```
Build and install
```sh
bash oss_build.sh --clean true --tf_dep_override "tensorflow~=2.9.1" --release --python "3.8"
bash ./bazel-bin/reverb/pip_package/build_pip_package --dst /tmp/reverb/dist/ --release
pip3 install /tmp/reverb/dist/dm_reverb-*
```
Cleaning
```sh
cd ../
rm -R reverb/
```
3. Install RL-Toolkit
```sh
pip3 install rl-toolkit
```
## Environments
| Environment | Observation space | Observation bounds | Action space | Action bounds | Reward bounds |
| ------------------------ |:-----------------:|:--------------------:|:------------:|:------------------:| :-----------: |
| BipedalWalkerHardcore-v3 | (24, ) | [-inf, inf] | (4, ) | [-1.0, 1.0] | [-1.0, 1.0] |
| FlappyBird-v0 | (16, 180) | [0, d<sub>max</sub>] | (2, ) | {DO NOTHING, FLAP} | [-1.0, 1.0] |
## Results
| Environment | SAC<br> + gSDE | SAC<br> + gSDE<br>+ Huber loss | SAC<br> + TQC<br> + gSDE | Q-Learning | RL-Toolkit |
| ------------------------ |:----------------------------------------------------------------------------------------------:|:------------------------------:|:-----------------------------------------------------------------------------------------------:|:----------:|:----------:|
| BipedalWalkerHardcore-v3 | 13 ± 18[<sup>(1)</sup>](https://sb3-contrib.readthedocs.io/en/stable/modules/tqc.html#results) | **239 ± 118** | 228 ± 18[<sup>(1)</sup>](https://sb3-contrib.readthedocs.io/en/stable/modules/tqc.html#results) | - | 205 ± 134 |
| FlappyBird-v0 | - |-| - | 209.298[<sup>(2)</sup>](https://arxiv.org/pdf/2003.09579) | 13 156 |
![dm_ant_ball_sac](https://raw.githubusercontent.com/markub3327/rl-toolkit/master/img/dm_ant_ball_sac.gif)
## Releases
* SAC + gSDE + Huber loss<br>   is stored here, [branch r2.0](https://github.com/markub3327/rl-toolkit/tree/r2.0)
* SAC + TQC + gSDE + LogCosh + Reverb<br>   is stored here, [branch r4.0](https://github.com/markub3327/rl-toolkit/tree/r4.1)
* DQN + SAC agents [branch r4.0](https://github.com/markub3327/rl-toolkit/)
----------------------------------
**Frameworks:** Tensorflow, DeepMind Reverb, Gymnasium, DeepMind Control Suite, WanDB, OpenCV
Raw data
{
"_id": null,
"home_page": "https://github.com/markub3327/rl-toolkit",
"name": "rl-toolkit",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "reinforcement-learning, ml, gymnasium, reverb, docker, rl-agents, rl, sac, rl-algorithms, soft-actor-critic, gsde, rl-toolkit, games, tensorflow, wandb",
"author": "Martin Kubovcik",
"author_email": "markub3327@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/9e/9f/ed0d42035d06eca2c09eca4e5079f1019ddaa4735eebc356c9eb9b87e895/rl-toolkit-5.0.0.tar.gz",
"platform": null,
"description": "# RL Toolkit\n\n[![Release](https://img.shields.io/github/release/markub3327/rl-toolkit)](https://github.com/markub3327/rl-toolkit/releases)\n![Tag](https://img.shields.io/github/v/tag/markub3327/rl-toolkit)\n[![Issues](https://img.shields.io/github/issues/markub3327/rl-toolkit)](https://github.com/markub3327/rl-toolkit/issues)\n![Commits](https://img.shields.io/github/commit-activity/w/markub3327/rl-toolkit)\n![Languages](https://img.shields.io/github/languages/count/markub3327/rl-toolkit)\n![Size](https://img.shields.io/github/repo-size/markub3327/rl-toolkit)\n\n## Papers\n * [**Playing Flappy Bird Based on Motion Recognition Using a Transformer Model and LIDAR Sensor**](https://www.mdpi.com/1424-8220/24/6/1905)\n * [**Soft Actor-Critic**](https://arxiv.org/abs/1812.05905)\n * [**Generalized State-Dependent Exploration**](https://arxiv.org/abs/2005.05719)\n * [**Reverb: A framework for experience replay**](https://arxiv.org/abs/2102.04736)\n * [**Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics**](https://arxiv.org/abs/2005.04269)\n * [**Acme: A Research Framework for Distributed Reinforcement Learning**](https://arxiv.org/abs/2006.00979)\n * [**Dueling Network Architectures for Deep Reinforcement Learning**](https://arxiv.org/abs/1511.06581)\n * [**Attention Is All You Need**](https://arxiv.org/abs/1706.03762)\n * [**An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale**](https://arxiv.org/abs/2010.11929)\n\n## Installation with PyPI\n\n### On PC AMD64 with Ubuntu/Debian\n\n 1. Install dependences\n ```sh\n apt update -y\n apt install swig -y\n ```\n 2. Install RL-Toolkit\n ```sh\n pip3 install rl-toolkit[all]\n ```\n 3. Run (for **Server**)\n ```sh\n rl_toolkit rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 server\n ```\n Run (for **Agent**)\n ```sh\n rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 agent\n ```\n Run (for **Learner**)\n ```sh\n rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 learner --db_server 192.168.1.2\n ```\n Run (for **Tester**)\n ```sh\n rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 tester -f save/model/actor.h5\n ```\n \n### On NVIDIA Jetson\n \n 1. Install dependences\n <br>Tensorflow for JetPack, follow instructions [here](https://docs.nvidia.com/deeplearning/frameworks/install-tf-jetson-platform/index.html) for installation.\n \n ```sh\n sudo apt install swig -y\n ```\n 2. Install Reverb\n <br>Download Bazel 3.7.2 for arm64, [here](https://github.com/bazelbuild/bazel)\n ```sh\n mkdir ~/bin\n mv ~/Downloads/bazel-3.7.2-linux-arm64 ~/bin/bazel\n chmod +x ~/bin/bazel\n export PATH=$PATH:~/bin\n ```\n Clone Reverb with version that corespond with TF verion installed on NVIDIA Jetson !\n ```sh\n git clone https://github.com/deepmind/reverb\n cd reverb/\n git checkout r0.9.0\n ```\n Make changes in Reverb before building !\n <br>In .bazelrc\n ```bazel\n - build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain\n + # build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain\n \n - build --copt=-mavx --copt=-DEIGEN_MAX_ALIGN_BYTES=64\n + build --copt=-DEIGEN_MAX_ALIGN_BYTES=64\n ```\n In WORKSPACE\n ```bazel\n - PROTOC_SHA256 = \"15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55\"\n + # PROTOC_SHA256 = \"15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55\"\n + PROTOC_SHA256 = \"7877fee5793c3aafd704e290230de9348d24e8612036f1d784c8863bc790082e\"\n ``` \n In oss_build.sh\n ```bazel\n - bazel test -c opt --copt=-mavx --config=manylinux2010 --test_output=errors //reverb/cc/...\n + bazel test -c opt --copt=\"-march=armv8-a+crypto\" --test_output=errors //reverb/cc/...\n \n # Builds Reverb and creates the wheel package.\n - bazel build -c opt --copt=-mavx $EXTRA_OPT --config=manylinux2010 reverb/pip_package:build_pip_package\n + bazel build -c opt --copt=\"-march=armv8-a+crypto\" $EXTRA_OPT reverb/pip_package:build_pip_package\n ```\n In reverb/cc/platform/default/repo.bzl\n ```bazel \n urls = [\n - \"https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-x86_64.zip\" % (version, version),\n + \"https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-aarch_64.zip\" % (version, version),\n ]\n ```\n\n In reverb/pip_package/build_pip_package.sh\n ```sh\n - \"${PYTHON_BIN_PATH}\" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} --plat manylinux2010_x86_64 > /dev/null\n + \"${PYTHON_BIN_PATH}\" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} > /dev/null\n ```\n Build and install\n ```sh\n bash oss_build.sh --clean true --tf_dep_override \"tensorflow~=2.9.1\" --release --python \"3.8\"\n bash ./bazel-bin/reverb/pip_package/build_pip_package --dst /tmp/reverb/dist/ --release\n pip3 install /tmp/reverb/dist/dm_reverb-*\n ```\n Cleaning\n ```sh\n cd ../\n rm -R reverb/\n ```\n 3. Install RL-Toolkit\n ```sh\n pip3 install rl-toolkit\n ```\n\n## Environments\n\n | Environment | Observation space | Observation bounds | Action space | Action bounds | Reward bounds |\n | ------------------------ |:-----------------:|:--------------------:|:------------:|:------------------:| :-----------: |\n | BipedalWalkerHardcore-v3 | (24, ) | [-inf, inf] | (4, ) | [-1.0, 1.0] | [-1.0, 1.0] |\n | FlappyBird-v0 | (16, 180) | [0, d<sub>max</sub>] | (2, ) | {DO NOTHING, FLAP} | [-1.0, 1.0] |\n \n## Results\n\n | Environment | SAC<br> + gSDE | SAC<br> + gSDE<br>+ Huber loss | SAC<br> + TQC<br> + gSDE | Q-Learning | RL-Toolkit |\n | ------------------------ |:----------------------------------------------------------------------------------------------:|:------------------------------:|:-----------------------------------------------------------------------------------------------:|:----------:|:----------:|\n | BipedalWalkerHardcore-v3 | 13 \u00b1 18[<sup>(1)</sup>](https://sb3-contrib.readthedocs.io/en/stable/modules/tqc.html#results) | **239 \u00b1 118** | 228 \u00b1 18[<sup>(1)</sup>](https://sb3-contrib.readthedocs.io/en/stable/modules/tqc.html#results) | - | 205 \u00b1 134 |\n | FlappyBird-v0 | - |-| - | 209.298[<sup>(2)</sup>](https://arxiv.org/pdf/2003.09579) | 13 156 |\n \n![dm_ant_ball_sac](https://raw.githubusercontent.com/markub3327/rl-toolkit/master/img/dm_ant_ball_sac.gif)\n\n## Releases\n\n * SAC + gSDE + Huber loss<br>   is stored here, [branch r2.0](https://github.com/markub3327/rl-toolkit/tree/r2.0)\n * SAC + TQC + gSDE + LogCosh + Reverb<br>   is stored here, [branch r4.0](https://github.com/markub3327/rl-toolkit/tree/r4.1)\n * DQN + SAC agents [branch r4.0](https://github.com/markub3327/rl-toolkit/)\n\n----------------------------------\n\n**Frameworks:** Tensorflow, DeepMind Reverb, Gymnasium, DeepMind Control Suite, WanDB, OpenCV\n\n\n# RL Toolkit\n\n[![Release](https://img.shields.io/github/release/markub3327/rl-toolkit)](https://github.com/markub3327/rl-toolkit/releases)\n![Tag](https://img.shields.io/github/v/tag/markub3327/rl-toolkit)\n[![Issues](https://img.shields.io/github/issues/markub3327/rl-toolkit)](https://github.com/markub3327/rl-toolkit/issues)\n![Commits](https://img.shields.io/github/commit-activity/w/markub3327/rl-toolkit)\n![Languages](https://img.shields.io/github/languages/count/markub3327/rl-toolkit)\n![Size](https://img.shields.io/github/repo-size/markub3327/rl-toolkit)\n\n## Papers\n * [**Playing Flappy Bird Based on Motion Recognition Using a Transformer Model and LIDAR Sensor**](https://www.mdpi.com/1424-8220/24/6/1905)\n * [**Soft Actor-Critic**](https://arxiv.org/abs/1812.05905)\n * [**Generalized State-Dependent Exploration**](https://arxiv.org/abs/2005.05719)\n * [**Reverb: A framework for experience replay**](https://arxiv.org/abs/2102.04736)\n * [**Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics**](https://arxiv.org/abs/2005.04269)\n * [**Acme: A Research Framework for Distributed Reinforcement Learning**](https://arxiv.org/abs/2006.00979)\n * [**Dueling Network Architectures for Deep Reinforcement Learning**](https://arxiv.org/abs/1511.06581)\n * [**Attention Is All You Need**](https://arxiv.org/abs/1706.03762)\n * [**An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale**](https://arxiv.org/abs/2010.11929)\n\n## Installation with PyPI\n\n### On PC AMD64 with Ubuntu/Debian\n\n 1. Install dependences\n ```sh\n apt update -y\n apt install swig -y\n ```\n 2. Install RL-Toolkit\n ```sh\n pip3 install rl-toolkit[all]\n ```\n 3. Run (for **Server**)\n ```sh\n rl_toolkit rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 server\n ```\n Run (for **Agent**)\n ```sh\n rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 agent\n ```\n Run (for **Learner**)\n ```sh\n rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 learner --db_server 192.168.1.2\n ```\n Run (for **Tester**)\n ```sh\n rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 tester -f save/model/actor.h5\n ```\n \n### On NVIDIA Jetson\n \n 1. Install dependences\n <br>Tensorflow for JetPack, follow instructions [here](https://docs.nvidia.com/deeplearning/frameworks/install-tf-jetson-platform/index.html) for installation.\n \n ```sh\n sudo apt install swig -y\n ```\n 2. Install Reverb\n <br>Download Bazel 3.7.2 for arm64, [here](https://github.com/bazelbuild/bazel)\n ```sh\n mkdir ~/bin\n mv ~/Downloads/bazel-3.7.2-linux-arm64 ~/bin/bazel\n chmod +x ~/bin/bazel\n export PATH=$PATH:~/bin\n ```\n Clone Reverb with version that corespond with TF verion installed on NVIDIA Jetson !\n ```sh\n git clone https://github.com/deepmind/reverb\n cd reverb/\n git checkout r0.9.0\n ```\n Make changes in Reverb before building !\n <br>In .bazelrc\n ```bazel\n - build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain\n + # build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain\n \n - build --copt=-mavx --copt=-DEIGEN_MAX_ALIGN_BYTES=64\n + build --copt=-DEIGEN_MAX_ALIGN_BYTES=64\n ```\n In WORKSPACE\n ```bazel\n - PROTOC_SHA256 = \"15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55\"\n + # PROTOC_SHA256 = \"15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55\"\n + PROTOC_SHA256 = \"7877fee5793c3aafd704e290230de9348d24e8612036f1d784c8863bc790082e\"\n ``` \n In oss_build.sh\n ```bazel\n - bazel test -c opt --copt=-mavx --config=manylinux2010 --test_output=errors //reverb/cc/...\n + bazel test -c opt --copt=\"-march=armv8-a+crypto\" --test_output=errors //reverb/cc/...\n \n # Builds Reverb and creates the wheel package.\n - bazel build -c opt --copt=-mavx $EXTRA_OPT --config=manylinux2010 reverb/pip_package:build_pip_package\n + bazel build -c opt --copt=\"-march=armv8-a+crypto\" $EXTRA_OPT reverb/pip_package:build_pip_package\n ```\n In reverb/cc/platform/default/repo.bzl\n ```bazel \n urls = [\n - \"https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-x86_64.zip\" % (version, version),\n + \"https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-aarch_64.zip\" % (version, version),\n ]\n ```\n\n In reverb/pip_package/build_pip_package.sh\n ```sh\n - \"${PYTHON_BIN_PATH}\" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} --plat manylinux2010_x86_64 > /dev/null\n + \"${PYTHON_BIN_PATH}\" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} > /dev/null\n ```\n Build and install\n ```sh\n bash oss_build.sh --clean true --tf_dep_override \"tensorflow~=2.9.1\" --release --python \"3.8\"\n bash ./bazel-bin/reverb/pip_package/build_pip_package --dst /tmp/reverb/dist/ --release\n pip3 install /tmp/reverb/dist/dm_reverb-*\n ```\n Cleaning\n ```sh\n cd ../\n rm -R reverb/\n ```\n 3. Install RL-Toolkit\n ```sh\n pip3 install rl-toolkit\n ```\n\n## Environments\n\n | Environment | Observation space | Observation bounds | Action space | Action bounds | Reward bounds |\n | ------------------------ |:-----------------:|:--------------------:|:------------:|:------------------:| :-----------: |\n | BipedalWalkerHardcore-v3 | (24, ) | [-inf, inf] | (4, ) | [-1.0, 1.0] | [-1.0, 1.0] |\n | FlappyBird-v0 | (16, 180) | [0, d<sub>max</sub>] | (2, ) | {DO NOTHING, FLAP} | [-1.0, 1.0] |\n \n## Results\n\n | Environment | SAC<br> + gSDE | SAC<br> + gSDE<br>+ Huber loss | SAC<br> + TQC<br> + gSDE | Q-Learning | RL-Toolkit |\n | ------------------------ |:----------------------------------------------------------------------------------------------:|:------------------------------:|:-----------------------------------------------------------------------------------------------:|:----------:|:----------:|\n | BipedalWalkerHardcore-v3 | 13 \u00b1 18[<sup>(1)</sup>](https://sb3-contrib.readthedocs.io/en/stable/modules/tqc.html#results) | **239 \u00b1 118** | 228 \u00b1 18[<sup>(1)</sup>](https://sb3-contrib.readthedocs.io/en/stable/modules/tqc.html#results) | - | 205 \u00b1 134 |\n | FlappyBird-v0 | - |-| - | 209.298[<sup>(2)</sup>](https://arxiv.org/pdf/2003.09579) | 13 156 |\n \n![dm_ant_ball_sac](https://raw.githubusercontent.com/markub3327/rl-toolkit/master/img/dm_ant_ball_sac.gif)\n\n## Releases\n\n * SAC + gSDE + Huber loss<br>   is stored here, [branch r2.0](https://github.com/markub3327/rl-toolkit/tree/r2.0)\n * SAC + TQC + gSDE + LogCosh + Reverb<br>   is stored here, [branch r4.0](https://github.com/markub3327/rl-toolkit/tree/r4.1)\n * DQN + SAC agents [branch r4.0](https://github.com/markub3327/rl-toolkit/)\n\n----------------------------------\n\n**Frameworks:** Tensorflow, DeepMind Reverb, Gymnasium, DeepMind Control Suite, WanDB, OpenCV\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "RL-Toolkit: A Research Framework for Robotics",
"version": "5.0.0",
"project_urls": {
"Bug Tracker": "https://github.com/markub3327/rl-toolkit/issues",
"Download": "https://github.com/markub3327/rl-toolkit/releases",
"Homepage": "https://github.com/markub3327/rl-toolkit"
},
"split_keywords": [
"reinforcement-learning",
" ml",
" gymnasium",
" reverb",
" docker",
" rl-agents",
" rl",
" sac",
" rl-algorithms",
" soft-actor-critic",
" gsde",
" rl-toolkit",
" games",
" tensorflow",
" wandb"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0da8c98dfc5d9163873690a312923d2e5f6bd8797d2a54983b2b39b506185f2a",
"md5": "99d63181e88353c378892f3109bfe182",
"sha256": "1544438396d539ee0d101eeed6dc6b5fd1a4d94e930d429f1394f2664530df63"
},
"downloads": -1,
"filename": "rl_toolkit-5.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "99d63181e88353c378892f3109bfe182",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 23838,
"upload_time": "2025-01-11T08:06:32",
"upload_time_iso_8601": "2025-01-11T08:06:32.012039Z",
"url": "https://files.pythonhosted.org/packages/0d/a8/c98dfc5d9163873690a312923d2e5f6bd8797d2a54983b2b39b506185f2a/rl_toolkit-5.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9e9fed0d42035d06eca2c09eca4e5079f1019ddaa4735eebc356c9eb9b87e895",
"md5": "8ae5d6577f0a99a04a9ca3996e47eedc",
"sha256": "8d64faf5ebdf5bcbdb8fcd76c6b9a6c9e6b2cf33801c73bf69d0e6b07a39de5d"
},
"downloads": -1,
"filename": "rl-toolkit-5.0.0.tar.gz",
"has_sig": false,
"md5_digest": "8ae5d6577f0a99a04a9ca3996e47eedc",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 20300,
"upload_time": "2025-01-11T08:06:34",
"upload_time_iso_8601": "2025-01-11T08:06:34.968731Z",
"url": "https://files.pythonhosted.org/packages/9e/9f/ed0d42035d06eca2c09eca4e5079f1019ddaa4735eebc356c9eb9b87e895/rl-toolkit-5.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-11 08:06:34",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "markub3327",
"github_project": "rl-toolkit",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "gymnasium",
"specs": []
},
{
"name": "box2d-py",
"specs": []
},
{
"name": "pygame",
"specs": []
},
{
"name": "dm_control",
"specs": []
},
{
"name": "gymnasium",
"specs": []
},
{
"name": "flappy_bird_gymnasium",
"specs": []
},
{
"name": "tensorflow",
"specs": []
},
{
"name": "tensorflow_probability",
"specs": []
},
{
"name": "wandb",
"specs": []
},
{
"name": "dm-reverb",
"specs": []
},
{
"name": "swig",
"specs": []
},
{
"name": "pyyaml",
"specs": []
}
],
"lcname": "rl-toolkit"
}