rl-blox

Name	rl-blox JSON
Version	0.5.0 JSON
	download
home_page	None
Summary	Modular RL building blocks in JAX
upload_time	2025-08-01 21:44:13
maintainer	None
docs_url	None
author	None
requires_python	>=3.10
license	None
keywords	reinforcement learning rl jax
VCS
bugtrack_url
requirements	numpy gymnasium jax jaxlib optax distrax chex tqdm flax matplotlib pre-commit
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            [![Tests](https://github.com/mlaux1/rl-blox/actions/workflows/test.yaml/badge.svg)](https://github.com/mlaux1/rl-blox/actions/workflows/test.yaml)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit)
[![DOI](https://zenodo.org/badge/641058888.svg)](https://doi.org/10.5281/zenodo.15746631)

# RL-BLOX

<table>
  <tr>
    <td>
        This project contains modular implementations of various model-free and model-based RL algorithms and consists of deep neural network-based as well as tabular representation of Q-Values, policies, etc. which can be used interchangeably.
        The goal of this project is for the authors to learn by reimplementing various RL algorithms and to eventually provide an algorithmic toolbox for research purposes.
    </td>
    <td><img src="doc/source/_static/rl_blox_logo_v1.png" width=750px"/></td>
  </tr>
</table>

<img src="doc/source/_static/blox.svg" width="100%"/>

> [!CAUTION]
> This library is still experimental and under development. Using it will not
> result in a good user experience. It is not well-documented, it is buggy,
> its interface is not clearly defined, its most interesting features are in
> feature branches. We recommend not to use it now. If you are an RL developer
> and want to collaborate, feel free to contact us.

## Design Principles

The implementation of this project follows the following principles:

1. Algorithms are functions!
2. Algorithms are implemented in single files.
3. Policies and values functions are data containers.

### Dependencies

1. Our environment interface is [Gymnasium](https://github.com/Farama-Foundation/Gymnasium).
2. We use [JAX](https://github.com/jax-ml/jax) for everything.
3. We use [Chex](https://github.com/google-deepmind/chex) to write reliable code.
4. For optimization algorithms we use [Optax](https://github.com/google-deepmind/optax).
5. For probability distributions we use [TensorFlow Probability](https://www.tensorflow.org/probability).
6. For all neural networks we use [Flax NNX](https://github.com/google/flax).
7. To save checkpoints we use [Orbax](https://github.com/google/orbax).

## Installation

```bash
git clone git@github.com:mlaux1/rl-blox.git
```

After cloning the repository, it is recommended to install the library in editable mode.

```bash
pip install -e .
```

To be able to run the provided examples use `pip install -e '.[examples]'`.
To install development dependencies, please use `pip install -e '.[dev]'`.
To enable logging with [aim](https://github.com/aimhubio/aim), please use `pip install -e '.[logging]'`
You can install all optional dependencies using `pip install -e '.[all]'`.

## Getting Started

RL-BLOX relies on gymnasium's environment interface. This is an example with
the SAC RL algorithm.

```python
import gymnasium as gym
import jax.numpy as jnp
import numpy as np

from rl_blox.algorithm.sac import create_sac_state, train_sac
from rl_blox.logging.checkpointer import OrbaxCheckpointer
from rl_blox.logging.logger import AIMLogger, LoggerList

env_name = "Pendulum-v1"
env = gym.make(env_name)
seed = 1
verbose = 1
env = gym.wrappers.RecordEpisodeStatistics(env)

hparams_models = dict(
    policy_hidden_nodes=[128, 128],
    policy_learning_rate=3e-4,
    q_hidden_nodes=[512, 512],
    q_learning_rate=1e-3,
    seed=seed,
)
hparams_algorithm = dict(
    total_timesteps=11_000,
    buffer_size=11_000,
    gamma=0.99,
    learning_starts=5_000,
)

if verbose:
    print(
        "This example uses the AIM logger. You will not see any output on "
        "stdout. Run 'aim up' to analyze the progress."
    )
checkpointer = OrbaxCheckpointer("/tmp/rl-blox/sac_example/", verbose=verbose)
logger = LoggerList([
    AIMLogger(),
    # uncomment to store checkpoints
    # checkpointer,
])
logger.define_experiment(
    env_name=env_name,
    algorithm_name="SAC",
    hparams=hparams_models | hparams_algorithm,
)
logger.define_checkpoint_frequency("policy", 1_000)

sac_state = create_sac_state(env, **hparams_models)
sac_result = train_sac(
    env,
    sac_state.policy,
    sac_state.policy_optimizer,
    sac_state.q,
    sac_state.q_optimizer,
    logger=logger,
    **hparams_algorithm,
)
env.close()
policy, _, q, _, _, _, _ = sac_result

# Do something with the trained policy...
```

## API Documentation

You can build the sphinx documentation with

```bash
pip install -e '.[doc]'
cd doc
make html
```

The HTML documentation will be available under `doc/build/html/index.html`.

## Contributing

If you wish to report bugs, please use the [issue tracker](https://github.com/mlaux1/rl-blox/issues). If you would like to contribute to RL-BLOX, just open an issue or a
[pull request](https://github.com/mlaux1/rl-blox/pulls). The target branch for
merge requests is the development branch. The development branch will be merged to master for new releases. If you have
questions about the software, you should ask them in the discussion section.

The recommended workflow to add a new feature, add documentation, or fix a bug is the following:

- Push your changes to a branch (e.g. feature/x, doc/y, or fix/z) of your fork of the RL-BLOX repository.
- Open a pull request to the main branch.

It is forbidden to directly push to the main branch.

## Testing

Run the tests with

```bash
pip install -e '.[dev]'
pytest
```

## Releases

### Semantic Versioning

Semantic versioning must be used, that is, the major version number will be incremented when the API changes in a backwards incompatible way, the minor version will be incremented when new functionality is added in a backwards compatible manner, and the patch version is incremented for bugfixes, documentation, etc.

## Funding

This library is currently developed at the [Robotics Group](https://robotik.dfki-bremen.de/en/about-us/university-of-bremen-robotics-group.html) of the
[University of Bremen](http://www.uni-bremen.de/en.html) together with the
[Robotics Innovation Center](http://robotik.dfki-bremen.de/en/startpage.html) of the
[German Research Center for Artificial Intelligence (DFKI)](http://www.dfki.de) in Bremen.

<p float="left">
    <img src="doc/source/_static/Uni_Logo.png" height="100px" />
    <img src="doc/source/_static/DFKI_Logo.png" height="100px" />
</p>

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "rl-blox",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "Melvin Laux <melvin.laux@uni-bremen.de>",
    "keywords": "Reinforcement Learning, RL, JAX",
    "author": null,
    "author_email": "Melvin Laux <melvin.laux@uni-bremen.de>, Alexander Fabisch <alexander.fabisch@dfki.de>",
    "download_url": "https://files.pythonhosted.org/packages/e4/f7/4ab91ad6b63b6ef569b8c3a64e5badc32ce32fe4c3985d886769ca5bcbe2/rl_blox-0.5.0.tar.gz",
    "platform": null,
    "description": "[![Tests](https://github.com/mlaux1/rl-blox/actions/workflows/test.yaml/badge.svg)](https://github.com/mlaux1/rl-blox/actions/workflows/test.yaml)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit)\n[![DOI](https://zenodo.org/badge/641058888.svg)](https://doi.org/10.5281/zenodo.15746631)\n\n# RL-BLOX\n\n<table>\n  <tr>\n    <td>\n        This project contains modular implementations of various model-free and model-based RL algorithms and consists of deep neural network-based as well as tabular representation of Q-Values, policies, etc. which can be used interchangeably.\n        The goal of this project is for the authors to learn by reimplementing various RL algorithms and to eventually provide an algorithmic toolbox for research purposes.\n    </td>\n    <td><img src=\"doc/source/_static/rl_blox_logo_v1.png\" width=750px\"/></td>\n  </tr>\n</table>\n\n<img src=\"doc/source/_static/blox.svg\" width=\"100%\"/>\n\n> [!CAUTION]\n> This library is still experimental and under development. Using it will not\n> result in a good user experience. It is not well-documented, it is buggy,\n> its interface is not clearly defined, its most interesting features are in\n> feature branches. We recommend not to use it now. If you are an RL developer\n> and want to collaborate, feel free to contact us.\n\n## Design Principles\n\nThe implementation of this project follows the following principles:\n\n1. Algorithms are functions!\n2. Algorithms are implemented in single files.\n3. Policies and values functions are data containers.\n\n### Dependencies\n\n1. Our environment interface is [Gymnasium](https://github.com/Farama-Foundation/Gymnasium).\n2. We use [JAX](https://github.com/jax-ml/jax) for everything.\n3. We use [Chex](https://github.com/google-deepmind/chex) to write reliable code.\n4. For optimization algorithms we use [Optax](https://github.com/google-deepmind/optax).\n5. For probability distributions we use [TensorFlow Probability](https://www.tensorflow.org/probability).\n6. For all neural networks we use [Flax NNX](https://github.com/google/flax).\n7. To save checkpoints we use [Orbax](https://github.com/google/orbax).\n\n## Installation\n\n```bash\ngit clone git@github.com:mlaux1/rl-blox.git\n```\n\nAfter cloning the repository, it is recommended to install the library in editable mode.\n\n```bash\npip install -e .\n```\n\nTo be able to run the provided examples use `pip install -e '.[examples]'`.\nTo install development dependencies, please use `pip install -e '.[dev]'`.\nTo enable logging with [aim](https://github.com/aimhubio/aim), please use `pip install -e '.[logging]'`\nYou can install all optional dependencies using `pip install -e '.[all]'`.\n\n## Getting Started\n\nRL-BLOX relies on gymnasium's environment interface. This is an example with\nthe SAC RL algorithm.\n\n```python\nimport gymnasium as gym\nimport jax.numpy as jnp\nimport numpy as np\n\nfrom rl_blox.algorithm.sac import create_sac_state, train_sac\nfrom rl_blox.logging.checkpointer import OrbaxCheckpointer\nfrom rl_blox.logging.logger import AIMLogger, LoggerList\n\nenv_name = \"Pendulum-v1\"\nenv = gym.make(env_name)\nseed = 1\nverbose = 1\nenv = gym.wrappers.RecordEpisodeStatistics(env)\n\nhparams_models = dict(\n    policy_hidden_nodes=[128, 128],\n    policy_learning_rate=3e-4,\n    q_hidden_nodes=[512, 512],\n    q_learning_rate=1e-3,\n    seed=seed,\n)\nhparams_algorithm = dict(\n    total_timesteps=11_000,\n    buffer_size=11_000,\n    gamma=0.99,\n    learning_starts=5_000,\n)\n\nif verbose:\n    print(\n        \"This example uses the AIM logger. You will not see any output on \"\n        \"stdout. Run 'aim up' to analyze the progress.\"\n    )\ncheckpointer = OrbaxCheckpointer(\"/tmp/rl-blox/sac_example/\", verbose=verbose)\nlogger = LoggerList([\n    AIMLogger(),\n    # uncomment to store checkpoints\n    # checkpointer,\n])\nlogger.define_experiment(\n    env_name=env_name,\n    algorithm_name=\"SAC\",\n    hparams=hparams_models | hparams_algorithm,\n)\nlogger.define_checkpoint_frequency(\"policy\", 1_000)\n\nsac_state = create_sac_state(env, **hparams_models)\nsac_result = train_sac(\n    env,\n    sac_state.policy,\n    sac_state.policy_optimizer,\n    sac_state.q,\n    sac_state.q_optimizer,\n    logger=logger,\n    **hparams_algorithm,\n)\nenv.close()\npolicy, _, q, _, _, _, _ = sac_result\n\n# Do something with the trained policy...\n```\n\n## API Documentation\n\nYou can build the sphinx documentation with\n\n```bash\npip install -e '.[doc]'\ncd doc\nmake html\n```\n\nThe HTML documentation will be available under `doc/build/html/index.html`.\n\n## Contributing\n\nIf you wish to report bugs, please use the [issue tracker](https://github.com/mlaux1/rl-blox/issues). If you would like to contribute to RL-BLOX, just open an issue or a\n[pull request](https://github.com/mlaux1/rl-blox/pulls). The target branch for\nmerge requests is the development branch. The development branch will be merged to master for new releases. If you have\nquestions about the software, you should ask them in the discussion section.\n\nThe recommended workflow to add a new feature, add documentation, or fix a bug is the following:\n\n- Push your changes to a branch (e.g. feature/x, doc/y, or fix/z) of your fork of the RL-BLOX repository.\n- Open a pull request to the main branch.\n\nIt is forbidden to directly push to the main branch.\n\n## Testing\n\nRun the tests with\n\n```bash\npip install -e '.[dev]'\npytest\n```\n\n## Releases\n\n### Semantic Versioning\n\nSemantic versioning must be used, that is, the major version number will be incremented when the API changes in a backwards incompatible way, the minor version will be incremented when new functionality is added in a backwards compatible manner, and the patch version is incremented for bugfixes, documentation, etc.\n\n## Funding\n\nThis library is currently developed at the [Robotics Group](https://robotik.dfki-bremen.de/en/about-us/university-of-bremen-robotics-group.html) of the\n[University of Bremen](http://www.uni-bremen.de/en.html) together with the\n[Robotics Innovation Center](http://robotik.dfki-bremen.de/en/startpage.html) of the\n[German Research Center for Artificial Intelligence (DFKI)](http://www.dfki.de) in Bremen.\n\n<p float=\"left\">\n    <img src=\"doc/source/_static/Uni_Logo.png\" height=\"100px\" />\n    <img src=\"doc/source/_static/DFKI_Logo.png\" height=\"100px\" />\n</p>\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Modular RL building blocks in JAX",
    "version": "0.5.0",
    "project_urls": {
        "Homepage": "https://github.com/mlaux1/rl-blox",
        "Issues": "https://github.com/mlaux1/rl-blox/issues"
    },
    "split_keywords": [
        "reinforcement learning",
        " rl",
        " jax"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e3adab82993e453b59b0ee232d93401ae42346bcf927d8869c12c661eeccd0ba",
                "md5": "f65f39b469c705f9534eb689a3457da1",
                "sha256": "b9804752de45badc356958207ba3000fe1e841af95916db7e5ff31ec8250c1eb"
            },
            "downloads": -1,
            "filename": "rl_blox-0.5.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f65f39b469c705f9534eb689a3457da1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 102158,
            "upload_time": "2025-08-01T21:44:11",
            "upload_time_iso_8601": "2025-08-01T21:44:11.379536Z",
            "url": "https://files.pythonhosted.org/packages/e3/ad/ab82993e453b59b0ee232d93401ae42346bcf927d8869c12c661eeccd0ba/rl_blox-0.5.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e4f74ab91ad6b63b6ef569b8c3a64e5badc32ce32fe4c3985d886769ca5bcbe2",
                "md5": "a0a887885d5a045fccd10291e63dfbc2",
                "sha256": "f4f7ff51c59a68d5b8052eb05b70355277980556e3b1c5e8279f896ec2226eb0"
            },
            "downloads": -1,
            "filename": "rl_blox-0.5.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a0a887885d5a045fccd10291e63dfbc2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 80580,
            "upload_time": "2025-08-01T21:44:13",
            "upload_time_iso_8601": "2025-08-01T21:44:13.019276Z",
            "url": "https://files.pythonhosted.org/packages/e4/f7/4ab91ad6b63b6ef569b8c3a64e5badc32ce32fe4c3985d886769ca5bcbe2/rl_blox-0.5.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-01 21:44:13",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mlaux1",
    "github_project": "rl-blox",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "gymnasium",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "jax",
            "specs": []
        },
        {
            "name": "jaxlib",
            "specs": []
        },
        {
            "name": "optax",
            "specs": []
        },
        {
            "name": "distrax",
            "specs": []
        },
        {
            "name": "chex",
            "specs": []
        },
        {
            "name": "tqdm",
            "specs": []
        },
        {
            "name": "flax",
            "specs": []
        },
        {
            "name": "matplotlib",
            "specs": []
        },
        {
            "name": "pre-commit",
            "specs": []
        }
    ],
    "lcname": "rl-blox"
}

None