torchbringer


Nametorchbringer JSON
Version 0.3.6 PyPI version JSON
download
home_pagehttps://github.com/moraguma/TorchBringer
SummaryA PyTorch library for deep reinforcement learning
upload_time2024-06-26 15:24:35
maintainerNone
docs_urlNone
authorMoraguma
requires_pythonNone
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            TorchBringer is an open-source framework that provides a simple interface for operating with pre-implemented deep reinforcement learning algorithms built on top of PyTorch. The interfaces provided can be used to operate deep RL agents either locally or remotely via gRPC. Currently, TorchBringer supports the following algorithms

- [x] DQN

## Quickstart

To install TorchBringer, run

```bash
pip install --upgrade pip
pip install torchbringer
```

### Local
Here's a simple project for running a TorchBringer agent on gymnasium's Cartpole environment.

```python
import gymnasium as gym
from itertools import count
import torch
from torchbringer.servers.torchbringer_agent import TorchBringerAgent

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

env = gym.make("CartPole-v1")
state, info = env.reset()

config = {
    # Check the reference section to understand config formatting
}

dqn = TorchBringerAgent()
dqn.initialize(config)
steps_done = 0

num_episodes = 600
for i_episode in range(num_episodes):
    state, info = env.reset()
    reward = torch.tensor([0.0], device=device)
    terminal = False
    
    state = torch.tensor(state, dtype=torch.float32, device=device).unsqueeze(0)
    for t in count():
        observation, reward, terminated, truncated, _ = env.step(dqn.step(state, reward, terminal).item())
        state = None if terminated else torch.tensor(observation, dtype=torch.float32, device=device).unsqueeze(0) 
        reward = torch.tensor([reward], device=device)
        terminal = terminated or truncated

        if terminal:
            dqn.step(state, reward, terminal)
            break
```

### Server
To start a TorchBringer server on a particular port, run

```bash
python -m torchbringer.servers.grpc.torchbringer_grpc_server <PORT> # For gRPC
python -m torchbringer.servers.socket.torchbringer_socket_server <PORT> # For socket
```

You can communicate with this server by using the provided Python client (see below) or develop a client of your own from the files found in `torchbringer/servers/grpc` in this repo to communicate with the server from applications built with different programming languages. 

```python
from torchbringer.servers.grpc.torchbringer_grpc_client import TorchBringerGRPCAgentClient
```

## Reference

`cartpole_local_dqn.py` provides a simple example of TorchBringer being used on gymnasium's CartPole-v1 envinronment. `cartpole_grpc_dqn.py` provides an example of how to use the gRPC interface to learn remotely.

The main class that is used in this framework is `TorchBringerAgent`, implemented in `servers/`. The gRPC server has an interface very similar to it.

### TorchBringerAgent
| Method | Parameters | Explanation |
|---|---|---|
| initialize() | config: dict | Initializes the agent according to the config. Read the config section for information on formatting |
| step() | state: Tensor, reward: Tensor, terminal: bool | Performs an optimization step and returns the selected action for this  |

### gRPC interface
Note that there is a client implemented in `servers/grpc/torchbringer_grpc_client.py` that has the exact same interface as `TorchBringerAgent`. This reference is mostly meant for building clients in other programming languages.

| Method | Parameters | Explanation |
|---|---|---|
| initialize() | config: string | Accepts a serialized config dict |
| step() | state: Matrix(dimensions list[int], value: list[float]), reward: float, terminal: bool | State should be given as a flattened matrix, action is returned the same way  |

### Socket interface
Note that there is a client implemented in `servers/socket/torchbringer_socket_client.py` that has the exact same interface as `TorchBringerAgent`. This reference is mostly meant for building clients in other programming languages.

Servers expect to receive a JSON string containing the field "method" for specifying the method by name as well as other parameters depending on the method. After being called, server will return a response in the form of another JSON string

| Method | Parameters | Explanation | Returns |
|---|---|---|---|
| "initialize" | config: JSON object | Accepts a serialized config dict | Information in the form {"info": string} |
| step() | state: list, reward: float, terminal: bool | The current percept from which to act | The action to take in the form {"action": list} |

## Config formatting
The config file is a dictionary that specifies the behavior of the agent. The RL implementation is specified by the value of the key "type". It also accepts a variety of other arguments depending on the imeplementation type.

Currently supported implementations are `dqn`.

The following specify the arguments allowed by each implementation type.

### DQN
| Argument | Explanation |
|---|---|
| "run_name": string | If given, will track episode reward and average loss through Aim for this run |
| "action_space": dict | The gym Space that represents the action space of the environment. Read the Space table on `Other specifications` |
| "gamma": float | Value of gamma |
| "tau": float = 1.0 | Value of tau |
| "target_network_update_frequency": int = 1 | Steps before updating target network based on tau |
| "epsilon": dict | The epsilon. Read the Epsilon table on `Other specifications` |
| "batch_size": int | Batch size |
| "grad_clip_value": float | Value to clip gradient. No clipping if not specified |
| "loss": dict | The loss. Read the Loss section on `Other specifications` |
| "optimizer": dict | The optimizer. Read the Optimizer section on `Other specifications` |
| "replay_buffer_size": int | Capacity of the replay buffer |
| "network": list[dict] | list of layer specs for the neural network. Read the Layers section on `Other specifications` |

### Other specifications

These are specifications for dictionaries that are used in the specification of learners. They each have an argument "type" and a corresponding class or function. In the case of classes, all of its initializing parameters can be passed as arguments in this dictionary. When specific arguments are expected, they will be made explicit.

#### Space
| Type | Class |
|---|---|
| discrete | `gym.spaces.Discrete` |

#### Epsilon
You can read `components/epsilon.py` to see how each of these are implemented
| Type | Arguments | Explanation
|---|---|---|
| exp_decrease | "start": float, "end": float, "steps_to_end": int | Decreases the epsilon exponentially over time.

#### Loss
| Type | Function |
|---|---|
| smooth_l1_loss | `torch.nn.SmoothL1Loss` |
| mseloss | `nn.MSELoss` |

#### Optimizer
| Type | Class |
|---|---|
| adamw | `torch.optim.AdamW` |
| rmsprop | `optim.RMSprop` |

#### Layers
| Type | Function |
|---|---|
| linear | `torch.nn.Linear` |
| relu | `torch.nn.ReLU` |

### Example config

``` python
config = {
    "type": "dqn",
    "action_space": {
        "type": "discrete",
        "n": 2
    },
    "gamma": 0.99,
    "tau": 0.005,
    "epsilon": {
        "type": "exp_decrease",
        "start": 0.9,
        "end": 0.05,
        "steps_to_end": 1000
    },
    "batch_size": 128,
    "grad_clip_value": 100,
    "loss": "smooth_l1_loss",
    "optimizer": {
        "type": "adamw",
        "lr": 1e-4, 
        "amsgrad": True
    },
    "replay_buffer_size": 10000,
    "network": [
        {
            "type": "linear",
            "in_features": int(n_observations),
            "out_features": 128,
        },
        {"type": "relu"},
        {
            "type": "linear",
            "in_features": 128,
            "out_features": 128,
        },
        {"type": "relu"},
        {
            "type": "linear",
            "in_features": 128,
            "out_features": int(n_actions),
        },
    ]
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/moraguma/TorchBringer",
    "name": "torchbringer",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Moraguma",
    "author_email": "g170603@dac.unicamp.br",
    "download_url": "https://files.pythonhosted.org/packages/b1/a8/ee1257c7e6ab9bdcea94bc177ba1127de49dacbef17c52528ac0429d92dd/torchbringer-0.3.6.tar.gz",
    "platform": null,
    "description": "TorchBringer is an open-source framework that provides a simple interface for operating with pre-implemented deep reinforcement learning algorithms built on top of PyTorch. The interfaces provided can be used to operate deep RL agents either locally or remotely via gRPC. Currently, TorchBringer supports the following algorithms\n\n- [x] DQN\n\n## Quickstart\n\nTo install TorchBringer, run\n\n```bash\npip install --upgrade pip\npip install torchbringer\n```\n\n### Local\nHere's a simple project for running a TorchBringer agent on gymnasium's Cartpole environment.\n\n```python\nimport gymnasium as gym\nfrom itertools import count\nimport torch\nfrom torchbringer.servers.torchbringer_agent import TorchBringerAgent\n\ndevice = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n\nenv = gym.make(\"CartPole-v1\")\nstate, info = env.reset()\n\nconfig = {\n    # Check the reference section to understand config formatting\n}\n\ndqn = TorchBringerAgent()\ndqn.initialize(config)\nsteps_done = 0\n\nnum_episodes = 600\nfor i_episode in range(num_episodes):\n    state, info = env.reset()\n    reward = torch.tensor([0.0], device=device)\n    terminal = False\n    \n    state = torch.tensor(state, dtype=torch.float32, device=device).unsqueeze(0)\n    for t in count():\n        observation, reward, terminated, truncated, _ = env.step(dqn.step(state, reward, terminal).item())\n        state = None if terminated else torch.tensor(observation, dtype=torch.float32, device=device).unsqueeze(0) \n        reward = torch.tensor([reward], device=device)\n        terminal = terminated or truncated\n\n        if terminal:\n            dqn.step(state, reward, terminal)\n            break\n```\n\n### Server\nTo start a TorchBringer server on a particular port, run\n\n```bash\npython -m torchbringer.servers.grpc.torchbringer_grpc_server <PORT> # For gRPC\npython -m torchbringer.servers.socket.torchbringer_socket_server <PORT> # For socket\n```\n\nYou can communicate with this server by using the provided Python client (see below) or develop a client of your own from the files found in `torchbringer/servers/grpc` in this repo to communicate with the server from applications built with different programming languages. \n\n```python\nfrom torchbringer.servers.grpc.torchbringer_grpc_client import TorchBringerGRPCAgentClient\n```\n\n## Reference\n\n`cartpole_local_dqn.py` provides a simple example of TorchBringer being used on gymnasium's CartPole-v1 envinronment. `cartpole_grpc_dqn.py` provides an example of how to use the gRPC interface to learn remotely.\n\nThe main class that is used in this framework is `TorchBringerAgent`, implemented in `servers/`. The gRPC server has an interface very similar to it.\n\n### TorchBringerAgent\n| Method | Parameters | Explanation |\n|---|---|---|\n| initialize() | config: dict | Initializes the agent according to the config. Read the config section for information on formatting |\n| step() | state: Tensor, reward: Tensor, terminal: bool | Performs an optimization step and returns the selected action for this  |\n\n### gRPC interface\nNote that there is a client implemented in `servers/grpc/torchbringer_grpc_client.py` that has the exact same interface as `TorchBringerAgent`. This reference is mostly meant for building clients in other programming languages.\n\n| Method | Parameters | Explanation |\n|---|---|---|\n| initialize() | config: string | Accepts a serialized config dict |\n| step() | state: Matrix(dimensions list[int], value: list[float]), reward: float, terminal: bool | State should be given as a flattened matrix, action is returned the same way  |\n\n### Socket interface\nNote that there is a client implemented in `servers/socket/torchbringer_socket_client.py` that has the exact same interface as `TorchBringerAgent`. This reference is mostly meant for building clients in other programming languages.\n\nServers expect to receive a JSON string containing the field \"method\" for specifying the method by name as well as other parameters depending on the method. After being called, server will return a response in the form of another JSON string\n\n| Method | Parameters | Explanation | Returns |\n|---|---|---|---|\n| \"initialize\" | config: JSON object | Accepts a serialized config dict | Information in the form {\"info\": string} |\n| step() | state: list, reward: float, terminal: bool | The current percept from which to act | The action to take in the form {\"action\": list} |\n\n## Config formatting\nThe config file is a dictionary that specifies the behavior of the agent. The RL implementation is specified by the value of the key \"type\". It also accepts a variety of other arguments depending on the imeplementation type.\n\nCurrently supported implementations are `dqn`.\n\nThe following specify the arguments allowed by each implementation type.\n\n### DQN\n| Argument | Explanation |\n|---|---|\n| \"run_name\": string | If given, will track episode reward and average loss through Aim for this run |\n| \"action_space\": dict | The gym Space that represents the action space of the environment. Read the Space table on `Other specifications` |\n| \"gamma\": float | Value of gamma |\n| \"tau\": float = 1.0 | Value of tau |\n| \"target_network_update_frequency\": int = 1 | Steps before updating target network based on tau |\n| \"epsilon\": dict | The epsilon. Read the Epsilon table on `Other specifications` |\n| \"batch_size\": int | Batch size |\n| \"grad_clip_value\": float | Value to clip gradient. No clipping if not specified |\n| \"loss\": dict | The loss. Read the Loss section on `Other specifications` |\n| \"optimizer\": dict | The optimizer. Read the Optimizer section on `Other specifications` |\n| \"replay_buffer_size\": int | Capacity of the replay buffer |\n| \"network\": list[dict] | list of layer specs for the neural network. Read the Layers section on `Other specifications` |\n\n### Other specifications\n\nThese are specifications for dictionaries that are used in the specification of learners. They each have an argument \"type\" and a corresponding class or function. In the case of classes, all of its initializing parameters can be passed as arguments in this dictionary. When specific arguments are expected, they will be made explicit.\n\n#### Space\n| Type | Class |\n|---|---|\n| discrete | `gym.spaces.Discrete` |\n\n#### Epsilon\nYou can read `components/epsilon.py` to see how each of these are implemented\n| Type | Arguments | Explanation\n|---|---|---|\n| exp_decrease | \"start\": float, \"end\": float, \"steps_to_end\": int | Decreases the epsilon exponentially over time.\n\n#### Loss\n| Type | Function |\n|---|---|\n| smooth_l1_loss | `torch.nn.SmoothL1Loss` |\n| mseloss | `nn.MSELoss` |\n\n#### Optimizer\n| Type | Class |\n|---|---|\n| adamw | `torch.optim.AdamW` |\n| rmsprop | `optim.RMSprop` |\n\n#### Layers\n| Type | Function |\n|---|---|\n| linear | `torch.nn.Linear` |\n| relu | `torch.nn.ReLU` |\n\n### Example config\n\n``` python\nconfig = {\n    \"type\": \"dqn\",\n    \"action_space\": {\n        \"type\": \"discrete\",\n        \"n\": 2\n    },\n    \"gamma\": 0.99,\n    \"tau\": 0.005,\n    \"epsilon\": {\n        \"type\": \"exp_decrease\",\n        \"start\": 0.9,\n        \"end\": 0.05,\n        \"steps_to_end\": 1000\n    },\n    \"batch_size\": 128,\n    \"grad_clip_value\": 100,\n    \"loss\": \"smooth_l1_loss\",\n    \"optimizer\": {\n        \"type\": \"adamw\",\n        \"lr\": 1e-4, \n        \"amsgrad\": True\n    },\n    \"replay_buffer_size\": 10000,\n    \"network\": [\n        {\n            \"type\": \"linear\",\n            \"in_features\": int(n_observations),\n            \"out_features\": 128,\n        },\n        {\"type\": \"relu\"},\n        {\n            \"type\": \"linear\",\n            \"in_features\": 128,\n            \"out_features\": 128,\n        },\n        {\"type\": \"relu\"},\n        {\n            \"type\": \"linear\",\n            \"in_features\": 128,\n            \"out_features\": int(n_actions),\n        },\n    ]\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A PyTorch library for deep reinforcement learning",
    "version": "0.3.6",
    "project_urls": {
        "Homepage": "https://github.com/moraguma/TorchBringer"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1e52c310482baf9a57025ca8ebaf77090b7842b99a0d24840a5e7495cd691cec",
                "md5": "0d8fadf409f65db9173b48d3d9227706",
                "sha256": "cbcd7017ec475d0c6c7ac74ae2611e48584c07d38aff00d9348befbc0e28e410"
            },
            "downloads": -1,
            "filename": "torchbringer-0.3.6-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0d8fadf409f65db9173b48d3d9227706",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 2082319,
            "upload_time": "2024-06-26T15:24:34",
            "upload_time_iso_8601": "2024-06-26T15:24:34.319493Z",
            "url": "https://files.pythonhosted.org/packages/1e/52/c310482baf9a57025ca8ebaf77090b7842b99a0d24840a5e7495cd691cec/torchbringer-0.3.6-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b1a8ee1257c7e6ab9bdcea94bc177ba1127de49dacbef17c52528ac0429d92dd",
                "md5": "467ac9be7250fffd777ef6c579e03fbb",
                "sha256": "2fe690a8b790bfbc1ea06c09b52808bf52646edf4262f126d40b6cd102725c53"
            },
            "downloads": -1,
            "filename": "torchbringer-0.3.6.tar.gz",
            "has_sig": false,
            "md5_digest": "467ac9be7250fffd777ef6c579e03fbb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 18361,
            "upload_time": "2024-06-26T15:24:35",
            "upload_time_iso_8601": "2024-06-26T15:24:35.980877Z",
            "url": "https://files.pythonhosted.org/packages/b1/a8/ee1257c7e6ab9bdcea94bc177ba1127de49dacbef17c52528ac0429d92dd/torchbringer-0.3.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-26 15:24:35",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "moraguma",
    "github_project": "TorchBringer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "torchbringer"
}
        
Elapsed time: 0.57562s