rl-toolkit

Name	rl-toolkit JSON
Version	5.0.0 JSON
	download
home_page	https://github.com/markub3327/rl-toolkit
Summary	RL-Toolkit: A Research Framework for Robotics
upload_time	2025-01-11 08:06:34
maintainer	None
docs_url	None
author	Martin Kubovcik
requires_python	>=3.9
license	MIT License
keywords	reinforcement-learning ml gymnasium reverb docker rl-agents rl sac rl-algorithms soft-actor-critic gsde rl-toolkit games tensorflow wandb
VCS
bugtrack_url
requirements	gymnasium box2d-py pygame dm_control gymnasium flappy_bird_gymnasium tensorflow tensorflow_probability wandb dm-reverb swig pyyaml
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # RL Toolkit

[![Release](https://img.shields.io/github/release/markub3327/rl-toolkit)](https://github.com/markub3327/rl-toolkit/releases)
![Tag](https://img.shields.io/github/v/tag/markub3327/rl-toolkit)
[![Issues](https://img.shields.io/github/issues/markub3327/rl-toolkit)](https://github.com/markub3327/rl-toolkit/issues)
![Commits](https://img.shields.io/github/commit-activity/w/markub3327/rl-toolkit)
![Languages](https://img.shields.io/github/languages/count/markub3327/rl-toolkit)
![Size](https://img.shields.io/github/repo-size/markub3327/rl-toolkit)

## Papers
  * [**Playing Flappy Bird Based on Motion Recognition Using a Transformer Model and LIDAR Sensor**](https://www.mdpi.com/1424-8220/24/6/1905)
  * [**Soft Actor-Critic**](https://arxiv.org/abs/1812.05905)
  * [**Generalized State-Dependent Exploration**](https://arxiv.org/abs/2005.05719)
  * [**Reverb: A framework for experience replay**](https://arxiv.org/abs/2102.04736)
  * [**Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics**](https://arxiv.org/abs/2005.04269)
  * [**Acme: A Research Framework for Distributed Reinforcement Learning**](https://arxiv.org/abs/2006.00979)
  * [**Dueling Network Architectures for Deep Reinforcement Learning**](https://arxiv.org/abs/1511.06581)
  * [**Attention Is All You Need**](https://arxiv.org/abs/1706.03762)
  * [**An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale**](https://arxiv.org/abs/2010.11929)

## Installation with PyPI

### On PC AMD64 with Ubuntu/Debian

  1. Install dependences
      ```sh
      apt update -y
      apt install swig -y
      ```
  2. Install RL-Toolkit
      ```sh
      pip3 install rl-toolkit[all]
      ```
  3. Run (for **Server**)
      ```sh
      rl_toolkit rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 server
      ```
     Run (for **Agent**)
      ```sh
      rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 agent
      ```
     Run (for **Learner**)
      ```sh
      rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 learner --db_server 192.168.1.2
      ```
     Run (for **Tester**)
      ```sh
      rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 tester -f save/model/actor.h5
      ```
  
### On NVIDIA Jetson
 
  1. Install dependences
      <br>Tensorflow for JetPack, follow instructions [here](https://docs.nvidia.com/deeplearning/frameworks/install-tf-jetson-platform/index.html) for installation.
      
      ```sh
      sudo apt install swig -y
      ```
  2. Install Reverb
  <br>Download Bazel 3.7.2 for arm64, [here](https://github.com/bazelbuild/bazel)
      ```sh
      mkdir ~/bin
      mv ~/Downloads/bazel-3.7.2-linux-arm64 ~/bin/bazel
      chmod +x ~/bin/bazel
      export PATH=$PATH:~/bin
      ```
      Clone Reverb with version that corespond with TF verion installed on NVIDIA Jetson !
      ```sh
      git clone https://github.com/deepmind/reverb
      cd reverb/
      git checkout r0.9.0
      ```
      Make changes in Reverb before building !
      <br>In .bazelrc
      ```bazel
      - build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain
      + # build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain
 
      - build --copt=-mavx --copt=-DEIGEN_MAX_ALIGN_BYTES=64
      + build --copt=-DEIGEN_MAX_ALIGN_BYTES=64
      ```
      In WORKSPACE
      ```bazel
      - PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55"
      + # PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55"
      + PROTOC_SHA256 = "7877fee5793c3aafd704e290230de9348d24e8612036f1d784c8863bc790082e"
      ``` 
      In oss_build.sh
      ```bazel
      -  bazel test -c opt --copt=-mavx --config=manylinux2010 --test_output=errors //reverb/cc/...
      +  bazel test -c opt --copt="-march=armv8-a+crypto" --test_output=errors //reverb/cc/...
 
      # Builds Reverb and creates the wheel package.
      -  bazel build -c opt --copt=-mavx $EXTRA_OPT --config=manylinux2010 reverb/pip_package:build_pip_package
      +  bazel build -c opt --copt="-march=armv8-a+crypto" $EXTRA_OPT reverb/pip_package:build_pip_package
      ```
      In reverb/cc/platform/default/repo.bzl
      ```bazel 
      urls = [
         -        "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-x86_64.zip" % (version, version),
         +        "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-aarch_64.zip" % (version, version),
      ]
      ```

     In reverb/pip_package/build_pip_package.sh
     ```sh
     -  "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} --plat manylinux2010_x86_64 > /dev/null
     +  "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG}  > /dev/null
      ```
      Build and install
      ```sh
      bash oss_build.sh --clean true --tf_dep_override "tensorflow~=2.9.1" --release --python "3.8"
      bash ./bazel-bin/reverb/pip_package/build_pip_package --dst /tmp/reverb/dist/ --release
      pip3 install /tmp/reverb/dist/dm_reverb-*
      ```
      Cleaning
      ```sh
      cd ../
      rm -R reverb/
      ```
  3. Install RL-Toolkit
      ```sh
      pip3 install rl-toolkit
      ```

## Environments

  | Environment              | Observation space |  Observation bounds  | Action space |   Action bounds    | Reward bounds |
  | ------------------------ |:-----------------:|:--------------------:|:------------:|:------------------:| :-----------: |
  | BipedalWalkerHardcore-v3 |      (24, )       |     [-inf, inf]      |    (4, )     |    [-1.0, 1.0]     | [-1.0, 1.0] |
  | FlappyBird-v0 |     (16, 180)     | [0, d<sub>max</sub>] |    (2, )     | {DO NOTHING, FLAP} | [-1.0, 1.0] |
  
## Results

  | Environment              |                                         SAC<br> + gSDE                                         | SAC<br> + gSDE<br>+ Huber loss |                                    SAC<br> + TQC<br> + gSDE                                     | Q-Learning | RL-Toolkit |
  | ------------------------ |:----------------------------------------------------------------------------------------------:|:------------------------------:|:-----------------------------------------------------------------------------------------------:|:----------:|:----------:|
  | BipedalWalkerHardcore-v3 | 13 ± 18[<sup>(1)</sup>](https://sb3-contrib.readthedocs.io/en/stable/modules/tqc.html#results) |         **239 ± 118**          | 228 ± 18[<sup>(1)</sup>](https://sb3-contrib.readthedocs.io/en/stable/modules/tqc.html#results) |     -      | 205 ± 134  |
  | FlappyBird-v0 |                                               -                                                |-|                                                -                                                |  209.298[<sup>(2)</sup>](https://arxiv.org/pdf/2003.09579)   |   13 156   |
  
![dm_ant_ball_sac](https://raw.githubusercontent.com/markub3327/rl-toolkit/master/img/dm_ant_ball_sac.gif)

## Releases

   * SAC + gSDE + Huber loss<br> &emsp; is stored here, [branch r2.0](https://github.com/markub3327/rl-toolkit/tree/r2.0)
   * SAC + TQC + gSDE + LogCosh + Reverb<br> &emsp; is stored here, [branch r4.0](https://github.com/markub3327/rl-toolkit/tree/r4.1)
   * DQN + SAC agents [branch r4.0](https://github.com/markub3327/rl-toolkit/)

----------------------------------

**Frameworks:** Tensorflow, DeepMind Reverb, Gymnasium, DeepMind Control Suite, WanDB, OpenCV


# RL Toolkit

[![Release](https://img.shields.io/github/release/markub3327/rl-toolkit)](https://github.com/markub3327/rl-toolkit/releases)
![Tag](https://img.shields.io/github/v/tag/markub3327/rl-toolkit)
[![Issues](https://img.shields.io/github/issues/markub3327/rl-toolkit)](https://github.com/markub3327/rl-toolkit/issues)
![Commits](https://img.shields.io/github/commit-activity/w/markub3327/rl-toolkit)
![Languages](https://img.shields.io/github/languages/count/markub3327/rl-toolkit)
![Size](https://img.shields.io/github/repo-size/markub3327/rl-toolkit)

## Papers
  * [**Playing Flappy Bird Based on Motion Recognition Using a Transformer Model and LIDAR Sensor**](https://www.mdpi.com/1424-8220/24/6/1905)
  * [**Soft Actor-Critic**](https://arxiv.org/abs/1812.05905)
  * [**Generalized State-Dependent Exploration**](https://arxiv.org/abs/2005.05719)
  * [**Reverb: A framework for experience replay**](https://arxiv.org/abs/2102.04736)
  * [**Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics**](https://arxiv.org/abs/2005.04269)
  * [**Acme: A Research Framework for Distributed Reinforcement Learning**](https://arxiv.org/abs/2006.00979)
  * [**Dueling Network Architectures for Deep Reinforcement Learning**](https://arxiv.org/abs/1511.06581)
  * [**Attention Is All You Need**](https://arxiv.org/abs/1706.03762)
  * [**An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale**](https://arxiv.org/abs/2010.11929)

## Installation with PyPI

### On PC AMD64 with Ubuntu/Debian

  1. Install dependences
      ```sh
      apt update -y
      apt install swig -y
      ```
  2. Install RL-Toolkit
      ```sh
      pip3 install rl-toolkit[all]
      ```
  3. Run (for **Server**)
      ```sh
      rl_toolkit rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 server
      ```
     Run (for **Agent**)
      ```sh
      rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 agent
      ```
     Run (for **Learner**)
      ```sh
      rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 learner --db_server 192.168.1.2
      ```
     Run (for **Tester**)
      ```sh
      rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 tester -f save/model/actor.h5
      ```
  
### On NVIDIA Jetson
 
  1. Install dependences
      <br>Tensorflow for JetPack, follow instructions [here](https://docs.nvidia.com/deeplearning/frameworks/install-tf-jetson-platform/index.html) for installation.
      
      ```sh
      sudo apt install swig -y
      ```
  2. Install Reverb
  <br>Download Bazel 3.7.2 for arm64, [here](https://github.com/bazelbuild/bazel)
      ```sh
      mkdir ~/bin
      mv ~/Downloads/bazel-3.7.2-linux-arm64 ~/bin/bazel
      chmod +x ~/bin/bazel
      export PATH=$PATH:~/bin
      ```
      Clone Reverb with version that corespond with TF verion installed on NVIDIA Jetson !
      ```sh
      git clone https://github.com/deepmind/reverb
      cd reverb/
      git checkout r0.9.0
      ```
      Make changes in Reverb before building !
      <br>In .bazelrc
      ```bazel
      - build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain
      + # build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain
 
      - build --copt=-mavx --copt=-DEIGEN_MAX_ALIGN_BYTES=64
      + build --copt=-DEIGEN_MAX_ALIGN_BYTES=64
      ```
      In WORKSPACE
      ```bazel
      - PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55"
      + # PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55"
      + PROTOC_SHA256 = "7877fee5793c3aafd704e290230de9348d24e8612036f1d784c8863bc790082e"
      ``` 
      In oss_build.sh
      ```bazel
      -  bazel test -c opt --copt=-mavx --config=manylinux2010 --test_output=errors //reverb/cc/...
      +  bazel test -c opt --copt="-march=armv8-a+crypto" --test_output=errors //reverb/cc/...
 
      # Builds Reverb and creates the wheel package.
      -  bazel build -c opt --copt=-mavx $EXTRA_OPT --config=manylinux2010 reverb/pip_package:build_pip_package
      +  bazel build -c opt --copt="-march=armv8-a+crypto" $EXTRA_OPT reverb/pip_package:build_pip_package
      ```
      In reverb/cc/platform/default/repo.bzl
      ```bazel 
      urls = [
         -        "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-x86_64.zip" % (version, version),
         +        "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-aarch_64.zip" % (version, version),
      ]
      ```

     In reverb/pip_package/build_pip_package.sh
     ```sh
     -  "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} --plat manylinux2010_x86_64 > /dev/null
     +  "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG}  > /dev/null
      ```
      Build and install
      ```sh
      bash oss_build.sh --clean true --tf_dep_override "tensorflow~=2.9.1" --release --python "3.8"
      bash ./bazel-bin/reverb/pip_package/build_pip_package --dst /tmp/reverb/dist/ --release
      pip3 install /tmp/reverb/dist/dm_reverb-*
      ```
      Cleaning
      ```sh
      cd ../
      rm -R reverb/
      ```
  3. Install RL-Toolkit
      ```sh
      pip3 install rl-toolkit
      ```

## Environments

  | Environment              | Observation space |  Observation bounds  | Action space |   Action bounds    | Reward bounds |
  | ------------------------ |:-----------------:|:--------------------:|:------------:|:------------------:| :-----------: |
  | BipedalWalkerHardcore-v3 |      (24, )       |     [-inf, inf]      |    (4, )     |    [-1.0, 1.0]     | [-1.0, 1.0] |
  | FlappyBird-v0 |     (16, 180)     | [0, d<sub>max</sub>] |    (2, )     | {DO NOTHING, FLAP} | [-1.0, 1.0] |
  
## Results

  | Environment              |                                         SAC<br> + gSDE                                         | SAC<br> + gSDE<br>+ Huber loss |                                    SAC<br> + TQC<br> + gSDE                                     | Q-Learning | RL-Toolkit |
  | ------------------------ |:----------------------------------------------------------------------------------------------:|:------------------------------:|:-----------------------------------------------------------------------------------------------:|:----------:|:----------:|
  | BipedalWalkerHardcore-v3 | 13 ± 18[<sup>(1)</sup>](https://sb3-contrib.readthedocs.io/en/stable/modules/tqc.html#results) |         **239 ± 118**          | 228 ± 18[<sup>(1)</sup>](https://sb3-contrib.readthedocs.io/en/stable/modules/tqc.html#results) |     -      | 205 ± 134  |
  | FlappyBird-v0 |                                               -                                                |-|                                                -                                                |  209.298[<sup>(2)</sup>](https://arxiv.org/pdf/2003.09579)   |   13 156   |
  
![dm_ant_ball_sac](https://raw.githubusercontent.com/markub3327/rl-toolkit/master/img/dm_ant_ball_sac.gif)

## Releases

   * SAC + gSDE + Huber loss<br> &emsp; is stored here, [branch r2.0](https://github.com/markub3327/rl-toolkit/tree/r2.0)
   * SAC + TQC + gSDE + LogCosh + Reverb<br> &emsp; is stored here, [branch r4.0](https://github.com/markub3327/rl-toolkit/tree/r4.1)
   * DQN + SAC agents [branch r4.0](https://github.com/markub3327/rl-toolkit/)

----------------------------------

**Frameworks:** Tensorflow, DeepMind Reverb, Gymnasium, DeepMind Control Suite, WanDB, OpenCV

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/markub3327/rl-toolkit",
    "name": "rl-toolkit",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "reinforcement-learning, ml, gymnasium, reverb, docker, rl-agents, rl, sac, rl-algorithms, soft-actor-critic, gsde, rl-toolkit, games, tensorflow, wandb",
    "author": "Martin Kubovcik",
    "author_email": "markub3327@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/9e/9f/ed0d42035d06eca2c09eca4e5079f1019ddaa4735eebc356c9eb9b87e895/rl-toolkit-5.0.0.tar.gz",
    "platform": null,
    "description": "# RL Toolkit\n\n[![Release](https://img.shields.io/github/release/markub3327/rl-toolkit)](https://github.com/markub3327/rl-toolkit/releases)\n![Tag](https://img.shields.io/github/v/tag/markub3327/rl-toolkit)\n[![Issues](https://img.shields.io/github/issues/markub3327/rl-toolkit)](https://github.com/markub3327/rl-toolkit/issues)\n![Commits](https://img.shields.io/github/commit-activity/w/markub3327/rl-toolkit)\n![Languages](https://img.shields.io/github/languages/count/markub3327/rl-toolkit)\n![Size](https://img.shields.io/github/repo-size/markub3327/rl-toolkit)\n\n## Papers\n  * [**Playing Flappy Bird Based on Motion Recognition Using a Transformer Model and LIDAR Sensor**](https://www.mdpi.com/1424-8220/24/6/1905)\n  * [**Soft Actor-Critic**](https://arxiv.org/abs/1812.05905)\n  * [**Generalized State-Dependent Exploration**](https://arxiv.org/abs/2005.05719)\n  * [**Reverb: A framework for experience replay**](https://arxiv.org/abs/2102.04736)\n  * [**Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics**](https://arxiv.org/abs/2005.04269)\n  * [**Acme: A Research Framework for Distributed Reinforcement Learning**](https://arxiv.org/abs/2006.00979)\n  * [**Dueling Network Architectures for Deep Reinforcement Learning**](https://arxiv.org/abs/1511.06581)\n  * [**Attention Is All You Need**](https://arxiv.org/abs/1706.03762)\n  * [**An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale**](https://arxiv.org/abs/2010.11929)\n\n## Installation with PyPI\n\n### On PC AMD64 with Ubuntu/Debian\n\n  1. Install dependences\n      ```sh\n      apt update -y\n      apt install swig -y\n      ```\n  2. Install RL-Toolkit\n      ```sh\n      pip3 install rl-toolkit[all]\n      ```\n  3. Run (for **Server**)\n      ```sh\n      rl_toolkit rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 server\n      ```\n     Run (for **Agent**)\n      ```sh\n      rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 agent\n      ```\n     Run (for **Learner**)\n      ```sh\n      rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 learner --db_server 192.168.1.2\n      ```\n     Run (for **Tester**)\n      ```sh\n      rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 tester -f save/model/actor.h5\n      ```\n  \n### On NVIDIA Jetson\n \n  1. Install dependences\n      <br>Tensorflow for JetPack, follow instructions [here](https://docs.nvidia.com/deeplearning/frameworks/install-tf-jetson-platform/index.html) for installation.\n      \n      ```sh\n      sudo apt install swig -y\n      ```\n  2. Install Reverb\n  <br>Download Bazel 3.7.2 for arm64, [here](https://github.com/bazelbuild/bazel)\n      ```sh\n      mkdir ~/bin\n      mv ~/Downloads/bazel-3.7.2-linux-arm64 ~/bin/bazel\n      chmod +x ~/bin/bazel\n      export PATH=$PATH:~/bin\n      ```\n      Clone Reverb with version that corespond with TF verion installed on NVIDIA Jetson !\n      ```sh\n      git clone https://github.com/deepmind/reverb\n      cd reverb/\n      git checkout r0.9.0\n      ```\n      Make changes in Reverb before building !\n      <br>In .bazelrc\n      ```bazel\n      - build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain\n      + # build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain\n \n      - build --copt=-mavx --copt=-DEIGEN_MAX_ALIGN_BYTES=64\n      + build --copt=-DEIGEN_MAX_ALIGN_BYTES=64\n      ```\n      In WORKSPACE\n      ```bazel\n      - PROTOC_SHA256 = \"15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55\"\n      + # PROTOC_SHA256 = \"15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55\"\n      + PROTOC_SHA256 = \"7877fee5793c3aafd704e290230de9348d24e8612036f1d784c8863bc790082e\"\n      ``` \n      In oss_build.sh\n      ```bazel\n      -  bazel test -c opt --copt=-mavx --config=manylinux2010 --test_output=errors //reverb/cc/...\n      +  bazel test -c opt --copt=\"-march=armv8-a+crypto\" --test_output=errors //reverb/cc/...\n \n      # Builds Reverb and creates the wheel package.\n      -  bazel build -c opt --copt=-mavx $EXTRA_OPT --config=manylinux2010 reverb/pip_package:build_pip_package\n      +  bazel build -c opt --copt=\"-march=armv8-a+crypto\" $EXTRA_OPT reverb/pip_package:build_pip_package\n      ```\n      In reverb/cc/platform/default/repo.bzl\n      ```bazel \n      urls = [\n         -        \"https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-x86_64.zip\" % (version, version),\n         +        \"https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-aarch_64.zip\" % (version, version),\n      ]\n      ```\n\n     In reverb/pip_package/build_pip_package.sh\n     ```sh\n     -  \"${PYTHON_BIN_PATH}\" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} --plat manylinux2010_x86_64 > /dev/null\n     +  \"${PYTHON_BIN_PATH}\" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG}  > /dev/null\n      ```\n      Build and install\n      ```sh\n      bash oss_build.sh --clean true --tf_dep_override \"tensorflow~=2.9.1\" --release --python \"3.8\"\n      bash ./bazel-bin/reverb/pip_package/build_pip_package --dst /tmp/reverb/dist/ --release\n      pip3 install /tmp/reverb/dist/dm_reverb-*\n      ```\n      Cleaning\n      ```sh\n      cd ../\n      rm -R reverb/\n      ```\n  3. Install RL-Toolkit\n      ```sh\n      pip3 install rl-toolkit\n      ```\n\n## Environments\n\n  | Environment              | Observation space |  Observation bounds  | Action space |   Action bounds    | Reward bounds |\n  | ------------------------ |:-----------------:|:--------------------:|:------------:|:------------------:| :-----------: |\n  | BipedalWalkerHardcore-v3 |      (24, )       |     [-inf, inf]      |    (4, )     |    [-1.0, 1.0]     | [-1.0, 1.0] |\n  | FlappyBird-v0 |     (16, 180)     | [0, d<sub>max</sub>] |    (2, )     | {DO NOTHING, FLAP} | [-1.0, 1.0] |\n  \n## Results\n\n  | Environment              |                                         SAC<br> + gSDE                                         | SAC<br> + gSDE<br>+ Huber loss |                                    SAC<br> + TQC<br> + gSDE                                     | Q-Learning | RL-Toolkit |\n  | ------------------------ |:----------------------------------------------------------------------------------------------:|:------------------------------:|:-----------------------------------------------------------------------------------------------:|:----------:|:----------:|\n  | BipedalWalkerHardcore-v3 | 13 \u00b1 18[<sup>(1)</sup>](https://sb3-contrib.readthedocs.io/en/stable/modules/tqc.html#results) |         **239 \u00b1 118**          | 228 \u00b1 18[<sup>(1)</sup>](https://sb3-contrib.readthedocs.io/en/stable/modules/tqc.html#results) |     -      | 205 \u00b1 134  |\n  | FlappyBird-v0 |                                               -                                                |-|                                                -                                                |  209.298[<sup>(2)</sup>](https://arxiv.org/pdf/2003.09579)   |   13 156   |\n  \n![dm_ant_ball_sac](https://raw.githubusercontent.com/markub3327/rl-toolkit/master/img/dm_ant_ball_sac.gif)\n\n## Releases\n\n   * SAC + gSDE + Huber loss<br> &emsp; is stored here, [branch r2.0](https://github.com/markub3327/rl-toolkit/tree/r2.0)\n   * SAC + TQC + gSDE + LogCosh + Reverb<br> &emsp; is stored here, [branch r4.0](https://github.com/markub3327/rl-toolkit/tree/r4.1)\n   * DQN + SAC agents [branch r4.0](https://github.com/markub3327/rl-toolkit/)\n\n----------------------------------\n\n**Frameworks:** Tensorflow, DeepMind Reverb, Gymnasium, DeepMind Control Suite, WanDB, OpenCV\n\n\n# RL Toolkit\n\n[![Release](https://img.shields.io/github/release/markub3327/rl-toolkit)](https://github.com/markub3327/rl-toolkit/releases)\n![Tag](https://img.shields.io/github/v/tag/markub3327/rl-toolkit)\n[![Issues](https://img.shields.io/github/issues/markub3327/rl-toolkit)](https://github.com/markub3327/rl-toolkit/issues)\n![Commits](https://img.shields.io/github/commit-activity/w/markub3327/rl-toolkit)\n![Languages](https://img.shields.io/github/languages/count/markub3327/rl-toolkit)\n![Size](https://img.shields.io/github/repo-size/markub3327/rl-toolkit)\n\n## Papers\n  * [**Playing Flappy Bird Based on Motion Recognition Using a Transformer Model and LIDAR Sensor**](https://www.mdpi.com/1424-8220/24/6/1905)\n  * [**Soft Actor-Critic**](https://arxiv.org/abs/1812.05905)\n  * [**Generalized State-Dependent Exploration**](https://arxiv.org/abs/2005.05719)\n  * [**Reverb: A framework for experience replay**](https://arxiv.org/abs/2102.04736)\n  * [**Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics**](https://arxiv.org/abs/2005.04269)\n  * [**Acme: A Research Framework for Distributed Reinforcement Learning**](https://arxiv.org/abs/2006.00979)\n  * [**Dueling Network Architectures for Deep Reinforcement Learning**](https://arxiv.org/abs/1511.06581)\n  * [**Attention Is All You Need**](https://arxiv.org/abs/1706.03762)\n  * [**An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale**](https://arxiv.org/abs/2010.11929)\n\n## Installation with PyPI\n\n### On PC AMD64 with Ubuntu/Debian\n\n  1. Install dependences\n      ```sh\n      apt update -y\n      apt install swig -y\n      ```\n  2. Install RL-Toolkit\n      ```sh\n      pip3 install rl-toolkit[all]\n      ```\n  3. Run (for **Server**)\n      ```sh\n      rl_toolkit rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 server\n      ```\n     Run (for **Agent**)\n      ```sh\n      rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 agent\n      ```\n     Run (for **Learner**)\n      ```sh\n      rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 learner --db_server 192.168.1.2\n      ```\n     Run (for **Tester**)\n      ```sh\n      rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 tester -f save/model/actor.h5\n      ```\n  \n### On NVIDIA Jetson\n \n  1. Install dependences\n      <br>Tensorflow for JetPack, follow instructions [here](https://docs.nvidia.com/deeplearning/frameworks/install-tf-jetson-platform/index.html) for installation.\n      \n      ```sh\n      sudo apt install swig -y\n      ```\n  2. Install Reverb\n  <br>Download Bazel 3.7.2 for arm64, [here](https://github.com/bazelbuild/bazel)\n      ```sh\n      mkdir ~/bin\n      mv ~/Downloads/bazel-3.7.2-linux-arm64 ~/bin/bazel\n      chmod +x ~/bin/bazel\n      export PATH=$PATH:~/bin\n      ```\n      Clone Reverb with version that corespond with TF verion installed on NVIDIA Jetson !\n      ```sh\n      git clone https://github.com/deepmind/reverb\n      cd reverb/\n      git checkout r0.9.0\n      ```\n      Make changes in Reverb before building !\n      <br>In .bazelrc\n      ```bazel\n      - build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain\n      + # build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain\n \n      - build --copt=-mavx --copt=-DEIGEN_MAX_ALIGN_BYTES=64\n      + build --copt=-DEIGEN_MAX_ALIGN_BYTES=64\n      ```\n      In WORKSPACE\n      ```bazel\n      - PROTOC_SHA256 = \"15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55\"\n      + # PROTOC_SHA256 = \"15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55\"\n      + PROTOC_SHA256 = \"7877fee5793c3aafd704e290230de9348d24e8612036f1d784c8863bc790082e\"\n      ``` \n      In oss_build.sh\n      ```bazel\n      -  bazel test -c opt --copt=-mavx --config=manylinux2010 --test_output=errors //reverb/cc/...\n      +  bazel test -c opt --copt=\"-march=armv8-a+crypto\" --test_output=errors //reverb/cc/...\n \n      # Builds Reverb and creates the wheel package.\n      -  bazel build -c opt --copt=-mavx $EXTRA_OPT --config=manylinux2010 reverb/pip_package:build_pip_package\n      +  bazel build -c opt --copt=\"-march=armv8-a+crypto\" $EXTRA_OPT reverb/pip_package:build_pip_package\n      ```\n      In reverb/cc/platform/default/repo.bzl\n      ```bazel \n      urls = [\n         -        \"https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-x86_64.zip\" % (version, version),\n         +        \"https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-aarch_64.zip\" % (version, version),\n      ]\n      ```\n\n     In reverb/pip_package/build_pip_package.sh\n     ```sh\n     -  \"${PYTHON_BIN_PATH}\" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} --plat manylinux2010_x86_64 > /dev/null\n     +  \"${PYTHON_BIN_PATH}\" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG}  > /dev/null\n      ```\n      Build and install\n      ```sh\n      bash oss_build.sh --clean true --tf_dep_override \"tensorflow~=2.9.1\" --release --python \"3.8\"\n      bash ./bazel-bin/reverb/pip_package/build_pip_package --dst /tmp/reverb/dist/ --release\n      pip3 install /tmp/reverb/dist/dm_reverb-*\n      ```\n      Cleaning\n      ```sh\n      cd ../\n      rm -R reverb/\n      ```\n  3. Install RL-Toolkit\n      ```sh\n      pip3 install rl-toolkit\n      ```\n\n## Environments\n\n  | Environment              | Observation space |  Observation bounds  | Action space |   Action bounds    | Reward bounds |\n  | ------------------------ |:-----------------:|:--------------------:|:------------:|:------------------:| :-----------: |\n  | BipedalWalkerHardcore-v3 |      (24, )       |     [-inf, inf]      |    (4, )     |    [-1.0, 1.0]     | [-1.0, 1.0] |\n  | FlappyBird-v0 |     (16, 180)     | [0, d<sub>max</sub>] |    (2, )     | {DO NOTHING, FLAP} | [-1.0, 1.0] |\n  \n## Results\n\n  | Environment              |                                         SAC<br> + gSDE                                         | SAC<br> + gSDE<br>+ Huber loss |                                    SAC<br> + TQC<br> + gSDE                                     | Q-Learning | RL-Toolkit |\n  | ------------------------ |:----------------------------------------------------------------------------------------------:|:------------------------------:|:-----------------------------------------------------------------------------------------------:|:----------:|:----------:|\n  | BipedalWalkerHardcore-v3 | 13 \u00b1 18[<sup>(1)</sup>](https://sb3-contrib.readthedocs.io/en/stable/modules/tqc.html#results) |         **239 \u00b1 118**          | 228 \u00b1 18[<sup>(1)</sup>](https://sb3-contrib.readthedocs.io/en/stable/modules/tqc.html#results) |     -      | 205 \u00b1 134  |\n  | FlappyBird-v0 |                                               -                                                |-|                                                -                                                |  209.298[<sup>(2)</sup>](https://arxiv.org/pdf/2003.09579)   |   13 156   |\n  \n![dm_ant_ball_sac](https://raw.githubusercontent.com/markub3327/rl-toolkit/master/img/dm_ant_ball_sac.gif)\n\n## Releases\n\n   * SAC + gSDE + Huber loss<br> &emsp; is stored here, [branch r2.0](https://github.com/markub3327/rl-toolkit/tree/r2.0)\n   * SAC + TQC + gSDE + LogCosh + Reverb<br> &emsp; is stored here, [branch r4.0](https://github.com/markub3327/rl-toolkit/tree/r4.1)\n   * DQN + SAC agents [branch r4.0](https://github.com/markub3327/rl-toolkit/)\n\n----------------------------------\n\n**Frameworks:** Tensorflow, DeepMind Reverb, Gymnasium, DeepMind Control Suite, WanDB, OpenCV\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "RL-Toolkit: A Research Framework for Robotics",
    "version": "5.0.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/markub3327/rl-toolkit/issues",
        "Download": "https://github.com/markub3327/rl-toolkit/releases",
        "Homepage": "https://github.com/markub3327/rl-toolkit"
    },
    "split_keywords": [
        "reinforcement-learning",
        " ml",
        " gymnasium",
        " reverb",
        " docker",
        " rl-agents",
        " rl",
        " sac",
        " rl-algorithms",
        " soft-actor-critic",
        " gsde",
        " rl-toolkit",
        " games",
        " tensorflow",
        " wandb"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0da8c98dfc5d9163873690a312923d2e5f6bd8797d2a54983b2b39b506185f2a",
                "md5": "99d63181e88353c378892f3109bfe182",
                "sha256": "1544438396d539ee0d101eeed6dc6b5fd1a4d94e930d429f1394f2664530df63"
            },
            "downloads": -1,
            "filename": "rl_toolkit-5.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "99d63181e88353c378892f3109bfe182",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 23838,
            "upload_time": "2025-01-11T08:06:32",
            "upload_time_iso_8601": "2025-01-11T08:06:32.012039Z",
            "url": "https://files.pythonhosted.org/packages/0d/a8/c98dfc5d9163873690a312923d2e5f6bd8797d2a54983b2b39b506185f2a/rl_toolkit-5.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9e9fed0d42035d06eca2c09eca4e5079f1019ddaa4735eebc356c9eb9b87e895",
                "md5": "8ae5d6577f0a99a04a9ca3996e47eedc",
                "sha256": "8d64faf5ebdf5bcbdb8fcd76c6b9a6c9e6b2cf33801c73bf69d0e6b07a39de5d"
            },
            "downloads": -1,
            "filename": "rl-toolkit-5.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "8ae5d6577f0a99a04a9ca3996e47eedc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 20300,
            "upload_time": "2025-01-11T08:06:34",
            "upload_time_iso_8601": "2025-01-11T08:06:34.968731Z",
            "url": "https://files.pythonhosted.org/packages/9e/9f/ed0d42035d06eca2c09eca4e5079f1019ddaa4735eebc356c9eb9b87e895/rl-toolkit-5.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-11 08:06:34",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "markub3327",
    "github_project": "rl-toolkit",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "gymnasium",
            "specs": []
        },
        {
            "name": "box2d-py",
            "specs": []
        },
        {
            "name": "pygame",
            "specs": []
        },
        {
            "name": "dm_control",
            "specs": []
        },
        {
            "name": "gymnasium",
            "specs": []
        },
        {
            "name": "flappy_bird_gymnasium",
            "specs": []
        },
        {
            "name": "tensorflow",
            "specs": []
        },
        {
            "name": "tensorflow_probability",
            "specs": []
        },
        {
            "name": "wandb",
            "specs": []
        },
        {
            "name": "dm-reverb",
            "specs": []
        },
        {
            "name": "swig",
            "specs": []
        },
        {
            "name": "pyyaml",
            "specs": []
        }
    ],
    "lcname": "rl-toolkit"
}

Martin Kubovcik