# Reinforcement Learning Replications
Reinforcement Learning Replications is a set of Pytorch implementations of reinforcement learning algorithms.
## Features
- Implement Algorithms
- Vanilla Policy Gradient (VPG)
- Trust Region Policy Optimization (TRPO)
- Proximal Policy Optimization (PPO)
- Deep Deterministic Policy Gradient (DDPG)
- Twin Delayed DDPG (TD3)
- Use Python standard logging library
- Support TensorBoard
## Benchmarks
You can check the benchmark result [here](https://yamatokataoka.github.io/reinforcement-learning-replications/benchmarks/visualization.html).
This benchmark is conducted based on [the Benchmarks for Spinning Up Implementations](https://spinningup.openai.com/en/latest/spinningup/bench.html).
All experiments were run for 3 random seeds each. All the details such as tensorboard and experiment logs, training scripts and trained models are stored in the [benchmarks](https://github.com/yamatokataoka/reinforcement-learning-replications/tree/main/benchmarks) folder.
## Example Code
Here is the code of training PPO on CartPole-v1 environment. You can run with [this Google Colab notebook](https://colab.research.google.com/drive/18MRw1FcDS4b_t3HAgfvyxBCi_1Z4lD__#scrollTo=A5GI_PJSchBn).
```python
import gymnasium as gym
import torch
import torch.nn as nn
from rl_replicas.algorithms import PPO
from rl_replicas.networks import MLP
from rl_replicas.policies import CategoricalPolicy
from rl_replicas.samplers import BatchSampler
from rl_replicas.value_function import ValueFunction
env_name = "CartPole-v1"
output_dir = "/content/ppo"
num_epochs = 80
seed = 0
network_hidden_sizes = [64, 64]
policy_learning_rate = 3e-4
value_function_learning_rate = 1e-3
env = gym.make(env_name)
env.action_space.seed(seed)
observation_size: int = env.observation_space.shape[0]
action_size: int = env.action_space.n
policy_network: nn.Module = MLP(
sizes=[observation_size] + network_hidden_sizes + [action_size]
)
value_function_network: nn.Module = MLP(
sizes=[observation_size] + network_hidden_sizes + [1]
)
model: PPO = PPO(
CategoricalPolicy(
network=policy_network,
optimizer=torch.optim.Adam(policy_network.parameters(), lr=3e-4),
),
ValueFunction(
network=value_function_network,
optimizer=torch.optim.Adam(value_function_network.parameters(), lr=1e-3),
),
env,
BatchSampler(env, seed),
)
model.learn(num_epochs=num_epochs, output_dir=output_dir)
```
## Contributing
All contributions are welcome.
### Release Flow
1. Create a release branch.
1. A pull request from the release branch to the `main` branch has the following:
- Change logs in the body.
- The `release` label.
- Commit that bumps up the version in `VERSION`.
1. Once the pull request is ready, merge the pull request. The CI will upload the package and create the release.
Raw data
{
"_id": null,
"home_page": null,
"name": "rl-replicas",
"maintainer": null,
"docs_url": null,
"requires_python": "~=3.10",
"maintainer_email": null,
"keywords": "rl_replicas, reinforcement learning, deep learning, pytorch",
"author": "Yamato Kataoka",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/51/7a/61ce67295bd8a0b0abbc50f9094da1f44802d6999858eaabf61132ee783d/rl_replicas-0.0.7.tar.gz",
"platform": null,
"description": "# Reinforcement Learning Replications\nReinforcement Learning Replications is a set of Pytorch implementations of reinforcement learning algorithms.\n\n\n## Features\n\n- Implement Algorithms\n - Vanilla Policy Gradient (VPG)\n - Trust Region Policy Optimization (TRPO)\n - Proximal Policy Optimization (PPO)\n - Deep Deterministic Policy Gradient (DDPG)\n - Twin Delayed DDPG (TD3)\n- Use Python standard logging library\n- Support TensorBoard\n\n\n## Benchmarks\n\nYou can check the benchmark result [here](https://yamatokataoka.github.io/reinforcement-learning-replications/benchmarks/visualization.html).\n\nThis benchmark is conducted based on [the Benchmarks for Spinning Up Implementations](https://spinningup.openai.com/en/latest/spinningup/bench.html).\n\nAll experiments were run for 3 random seeds each. All the details such as tensorboard and experiment logs, training scripts and trained models are stored in the [benchmarks](https://github.com/yamatokataoka/reinforcement-learning-replications/tree/main/benchmarks) folder.\n\n## Example Code\n\nHere is the code of training PPO on CartPole-v1 environment. You can run with [this Google Colab notebook](https://colab.research.google.com/drive/18MRw1FcDS4b_t3HAgfvyxBCi_1Z4lD__#scrollTo=A5GI_PJSchBn).\n\n```python\nimport gymnasium as gym\nimport torch\nimport torch.nn as nn\n\nfrom rl_replicas.algorithms import PPO\nfrom rl_replicas.networks import MLP\nfrom rl_replicas.policies import CategoricalPolicy\nfrom rl_replicas.samplers import BatchSampler\nfrom rl_replicas.value_function import ValueFunction\n\nenv_name = \"CartPole-v1\"\noutput_dir = \"/content/ppo\"\nnum_epochs = 80\nseed = 0\n\nnetwork_hidden_sizes = [64, 64]\npolicy_learning_rate = 3e-4\nvalue_function_learning_rate = 1e-3\n\nenv = gym.make(env_name)\nenv.action_space.seed(seed)\n\nobservation_size: int = env.observation_space.shape[0]\naction_size: int = env.action_space.n\n\npolicy_network: nn.Module = MLP(\n sizes=[observation_size] + network_hidden_sizes + [action_size]\n)\n\nvalue_function_network: nn.Module = MLP(\n sizes=[observation_size] + network_hidden_sizes + [1]\n)\n\nmodel: PPO = PPO(\n CategoricalPolicy(\n network=policy_network,\n optimizer=torch.optim.Adam(policy_network.parameters(), lr=3e-4),\n ),\n ValueFunction(\n network=value_function_network,\n optimizer=torch.optim.Adam(value_function_network.parameters(), lr=1e-3),\n ),\n env,\n BatchSampler(env, seed),\n)\n\nmodel.learn(num_epochs=num_epochs, output_dir=output_dir)\n\n```\n\n## Contributing\n\nAll contributions are welcome.\n\n### Release Flow\n\n1. Create a release branch.\n1. A pull request from the release branch to the `main` branch has the following:\n - Change logs in the body.\n - The `release` label.\n - Commit that bumps up the version in `VERSION`.\n1. Once the pull request is ready, merge the pull request. The CI will upload the package and create the release.\n",
"bugtrack_url": null,
"license": null,
"summary": "Reinforcement Learning Replications is a set of Pytorch implementations of reinforcement learning algorithms.",
"version": "0.0.7",
"project_urls": {
"Homepage": "https://github.com/yamatokataoka/reinforcement-learning-replications"
},
"split_keywords": [
"rl_replicas",
" reinforcement learning",
" deep learning",
" pytorch"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2593c887c888d6a5ccfa0e9f1fc4056fbf920853ac6b3906a1ad3baf4478c28a",
"md5": "68844c526e237ad2ada265f17e64b7b9",
"sha256": "e93f4df1ec6fc67add85ac84887ba4027862e0cf0c19c27968323765a05a3e12"
},
"downloads": -1,
"filename": "rl_replicas-0.0.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "68844c526e237ad2ada265f17e64b7b9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "~=3.10",
"size": 33468,
"upload_time": "2024-12-15T03:42:52",
"upload_time_iso_8601": "2024-12-15T03:42:52.018785Z",
"url": "https://files.pythonhosted.org/packages/25/93/c887c888d6a5ccfa0e9f1fc4056fbf920853ac6b3906a1ad3baf4478c28a/rl_replicas-0.0.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "517a61ce67295bd8a0b0abbc50f9094da1f44802d6999858eaabf61132ee783d",
"md5": "ccbf4c9abf9d468ca172bee293144dd1",
"sha256": "03e9b89649b0cfee96fd1f6ce1f41c42405146ec4e13744fc70a9b89d27a0073"
},
"downloads": -1,
"filename": "rl_replicas-0.0.7.tar.gz",
"has_sig": false,
"md5_digest": "ccbf4c9abf9d468ca172bee293144dd1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "~=3.10",
"size": 20955,
"upload_time": "2024-12-15T03:42:54",
"upload_time_iso_8601": "2024-12-15T03:42:54.179930Z",
"url": "https://files.pythonhosted.org/packages/51/7a/61ce67295bd8a0b0abbc50f9094da1f44802d6999858eaabf61132ee783d/rl_replicas-0.0.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-15 03:42:54",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "yamatokataoka",
"github_project": "reinforcement-learning-replications",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "rl-replicas"
}