rl-replicas

Name	rl-replicas JSON
Version	0.0.7 JSON
	download
home_page	None
Summary	Reinforcement Learning Replications is a set of Pytorch implementations of reinforcement learning algorithms.
upload_time	2024-12-15 03:42:54
maintainer	None
docs_url	None
author	Yamato Kataoka
requires_python	~=3.10
license	None
keywords	rl_replicas reinforcement learning deep learning pytorch
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Reinforcement Learning Replications
Reinforcement Learning Replications is a set of Pytorch implementations of reinforcement learning algorithms.


## Features

- Implement Algorithms
  - Vanilla Policy Gradient (VPG)
  - Trust Region Policy Optimization (TRPO)
  - Proximal Policy Optimization (PPO)
  - Deep Deterministic Policy Gradient (DDPG)
  - Twin Delayed DDPG (TD3)
- Use Python standard logging library
- Support TensorBoard


## Benchmarks

You can check the benchmark result [here](https://yamatokataoka.github.io/reinforcement-learning-replications/benchmarks/visualization.html).

This benchmark is conducted based on [the Benchmarks for Spinning Up Implementations](https://spinningup.openai.com/en/latest/spinningup/bench.html).

All experiments were run for 3 random seeds each. All the details such as tensorboard and experiment logs, training scripts and trained models are stored in the [benchmarks](https://github.com/yamatokataoka/reinforcement-learning-replications/tree/main/benchmarks) folder.

## Example Code

Here is the code of training PPO on CartPole-v1 environment. You can run with [this Google Colab notebook](https://colab.research.google.com/drive/18MRw1FcDS4b_t3HAgfvyxBCi_1Z4lD__#scrollTo=A5GI_PJSchBn).

```python
import gymnasium as gym
import torch
import torch.nn as nn

from rl_replicas.algorithms import PPO
from rl_replicas.networks import MLP
from rl_replicas.policies import CategoricalPolicy
from rl_replicas.samplers import BatchSampler
from rl_replicas.value_function import ValueFunction

env_name = "CartPole-v1"
output_dir = "/content/ppo"
num_epochs = 80
seed = 0

network_hidden_sizes = [64, 64]
policy_learning_rate = 3e-4
value_function_learning_rate = 1e-3

env = gym.make(env_name)
env.action_space.seed(seed)

observation_size: int = env.observation_space.shape[0]
action_size: int = env.action_space.n

policy_network: nn.Module = MLP(
    sizes=[observation_size] + network_hidden_sizes + [action_size]
)

value_function_network: nn.Module = MLP(
    sizes=[observation_size] + network_hidden_sizes + [1]
)

model: PPO = PPO(
    CategoricalPolicy(
        network=policy_network,
        optimizer=torch.optim.Adam(policy_network.parameters(), lr=3e-4),
    ),
    ValueFunction(
        network=value_function_network,
        optimizer=torch.optim.Adam(value_function_network.parameters(), lr=1e-3),
    ),
    env,
    BatchSampler(env, seed),
)

model.learn(num_epochs=num_epochs, output_dir=output_dir)

```

## Contributing

All contributions are welcome.

### Release Flow

1. Create a release branch.
1. A pull request from the release branch to the `main` branch has the following:
   - Change logs in the body.
   - The `release` label.
   - Commit that bumps up the version in `VERSION`.
1. Once the pull request is ready, merge the pull request. The CI will upload the package and create the release.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "rl-replicas",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "~=3.10",
    "maintainer_email": null,
    "keywords": "rl_replicas, reinforcement learning, deep learning, pytorch",
    "author": "Yamato Kataoka",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/51/7a/61ce67295bd8a0b0abbc50f9094da1f44802d6999858eaabf61132ee783d/rl_replicas-0.0.7.tar.gz",
    "platform": null,
    "description": "# Reinforcement Learning Replications\nReinforcement Learning Replications is a set of Pytorch implementations of reinforcement learning algorithms.\n\n\n## Features\n\n- Implement Algorithms\n  - Vanilla Policy Gradient (VPG)\n  - Trust Region Policy Optimization (TRPO)\n  - Proximal Policy Optimization (PPO)\n  - Deep Deterministic Policy Gradient (DDPG)\n  - Twin Delayed DDPG (TD3)\n- Use Python standard logging library\n- Support TensorBoard\n\n\n## Benchmarks\n\nYou can check the benchmark result [here](https://yamatokataoka.github.io/reinforcement-learning-replications/benchmarks/visualization.html).\n\nThis benchmark is conducted based on [the Benchmarks for Spinning Up Implementations](https://spinningup.openai.com/en/latest/spinningup/bench.html).\n\nAll experiments were run for 3 random seeds each. All the details such as tensorboard and experiment logs, training scripts and trained models are stored in the [benchmarks](https://github.com/yamatokataoka/reinforcement-learning-replications/tree/main/benchmarks) folder.\n\n## Example Code\n\nHere is the code of training PPO on CartPole-v1 environment. You can run with [this Google Colab notebook](https://colab.research.google.com/drive/18MRw1FcDS4b_t3HAgfvyxBCi_1Z4lD__#scrollTo=A5GI_PJSchBn).\n\n```python\nimport gymnasium as gym\nimport torch\nimport torch.nn as nn\n\nfrom rl_replicas.algorithms import PPO\nfrom rl_replicas.networks import MLP\nfrom rl_replicas.policies import CategoricalPolicy\nfrom rl_replicas.samplers import BatchSampler\nfrom rl_replicas.value_function import ValueFunction\n\nenv_name = \"CartPole-v1\"\noutput_dir = \"/content/ppo\"\nnum_epochs = 80\nseed = 0\n\nnetwork_hidden_sizes = [64, 64]\npolicy_learning_rate = 3e-4\nvalue_function_learning_rate = 1e-3\n\nenv = gym.make(env_name)\nenv.action_space.seed(seed)\n\nobservation_size: int = env.observation_space.shape[0]\naction_size: int = env.action_space.n\n\npolicy_network: nn.Module = MLP(\n    sizes=[observation_size] + network_hidden_sizes + [action_size]\n)\n\nvalue_function_network: nn.Module = MLP(\n    sizes=[observation_size] + network_hidden_sizes + [1]\n)\n\nmodel: PPO = PPO(\n    CategoricalPolicy(\n        network=policy_network,\n        optimizer=torch.optim.Adam(policy_network.parameters(), lr=3e-4),\n    ),\n    ValueFunction(\n        network=value_function_network,\n        optimizer=torch.optim.Adam(value_function_network.parameters(), lr=1e-3),\n    ),\n    env,\n    BatchSampler(env, seed),\n)\n\nmodel.learn(num_epochs=num_epochs, output_dir=output_dir)\n\n```\n\n## Contributing\n\nAll contributions are welcome.\n\n### Release Flow\n\n1. Create a release branch.\n1. A pull request from the release branch to the `main` branch has the following:\n   - Change logs in the body.\n   - The `release` label.\n   - Commit that bumps up the version in `VERSION`.\n1. Once the pull request is ready, merge the pull request. The CI will upload the package and create the release.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Reinforcement Learning Replications is a set of Pytorch implementations of reinforcement learning algorithms.",
    "version": "0.0.7",
    "project_urls": {
        "Homepage": "https://github.com/yamatokataoka/reinforcement-learning-replications"
    },
    "split_keywords": [
        "rl_replicas",
        " reinforcement learning",
        " deep learning",
        " pytorch"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2593c887c888d6a5ccfa0e9f1fc4056fbf920853ac6b3906a1ad3baf4478c28a",
                "md5": "68844c526e237ad2ada265f17e64b7b9",
                "sha256": "e93f4df1ec6fc67add85ac84887ba4027862e0cf0c19c27968323765a05a3e12"
            },
            "downloads": -1,
            "filename": "rl_replicas-0.0.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "68844c526e237ad2ada265f17e64b7b9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "~=3.10",
            "size": 33468,
            "upload_time": "2024-12-15T03:42:52",
            "upload_time_iso_8601": "2024-12-15T03:42:52.018785Z",
            "url": "https://files.pythonhosted.org/packages/25/93/c887c888d6a5ccfa0e9f1fc4056fbf920853ac6b3906a1ad3baf4478c28a/rl_replicas-0.0.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "517a61ce67295bd8a0b0abbc50f9094da1f44802d6999858eaabf61132ee783d",
                "md5": "ccbf4c9abf9d468ca172bee293144dd1",
                "sha256": "03e9b89649b0cfee96fd1f6ce1f41c42405146ec4e13744fc70a9b89d27a0073"
            },
            "downloads": -1,
            "filename": "rl_replicas-0.0.7.tar.gz",
            "has_sig": false,
            "md5_digest": "ccbf4c9abf9d468ca172bee293144dd1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "~=3.10",
            "size": 20955,
            "upload_time": "2024-12-15T03:42:54",
            "upload_time_iso_8601": "2024-12-15T03:42:54.179930Z",
            "url": "https://files.pythonhosted.org/packages/51/7a/61ce67295bd8a0b0abbc50f9094da1f44802d6999858eaabf61132ee783d/rl_replicas-0.0.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-15 03:42:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yamatokataoka",
    "github_project": "reinforcement-learning-replications",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "rl-replicas"
}

Yamato Kataoka