coltra


Namecoltra JSON
Version 0.1.9 PyPI version JSON
download
home_pageNone
SummaryColtra is a simple moddable RL algorithm implementation
upload_time2023-09-09 23:04:39
maintainerNone
docs_urlNone
authorNone
requires_pythonNone
licenseNone
keywords
VCS
bugtrack_url
requirements typarse numpy gymnasium jupyter jupyterlab mlagents-envs PyYAML scipy matplotlib seaborn tensorboard torch tqdm ipykernel numba pytest coverage wandb pybullet opencv-python PettingZoo supersuit cloudpickle pillow setuptools pyvirtualdisplay optuna pytype jax jaxlib shortuuid black protobuf
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <!-- # Coltra RL -->

![coltra logo](https://user-images.githubusercontent.com/19414946/139559727-d71caab7-1467-47a5-ac82-acdb9062e85f.png)

Figured I can finally open-source this. 
Coltra, a portmanteau of **Col**lect and **tra**in is the RL framework I've been developing for my PhD work due to a frustration with all the other existing libraries.



At the time of writing, it only contains an implementation of PPO, although I intend to change that soon. 
And if my initial designs were correct, that should prove to be quite easy. Note: the current code
is tightly connected with my thesis work. After I defend, I might start decoupling it, and we'll see what happens then.

My main philosophy of coltra is that it should be easy to modify, and easy to access literally any detail of the RL algorithm that you might want. 
For that reason, I expect that many potential users would even create their own forks, adapting the code to their own needs.

General note about terminology: the context of this project is crowd simulation, so the word "crowd" will pop up sometimes.
You can assume it basically just refers to "homogeneous multiagent something with parameter sharing"

## Another RL framework? Why?

Simple answer - because I wasn't able to use any of the existing ones. Stable Baselines 3 only barely supports multiagent scenarios,
and is only barely hackable. RLlib is super fast, but a nightmare to modify in any way that deviates from the norm. 
CleanRL is very simple, but components are not reusable at all.

Coltra can be thought of as a linear interpolation between CleanRL and SB3, with a focus on multiagent environments.

## Quickstart

Proper docs are yet to be written, but here's an outline of how to use this library.

### Installation

The library was initially written on Python 3.8, and then ported to 3.9. 
We have no guarantees that it will work on any earlier versions, although it should be easy to make that happen.

The procedure is the usual

```shell
git clone https://github.com/redtachyon/coltra-rl
cd coltra-rl
pip install -r requirements.txt
pip install -e .
```

We particularly invite people to make their forks of the library to implement whatever crazy ideas.

### At a glance

```python
from tqdm import trange

from coltra.models import MLPModel
from coltra.agents import DAgent
from coltra.groups import HomogeneousGroup

from coltra.envs import MultiGymEnv
from coltra.policy_optimization import CrowdPPOptimizer
from coltra.collectors import collect_crowd_data

if __name__ == '__main__':
    env = MultiGymEnv.get_venv(8, env_name="CartPole-v1")

    agents = HomogeneousGroup(  # Parameter-shared group agent
        DAgent(  # Individual Discrete Agent
            MLPModel(  # Policy and Value neural network
                {
                    "input_size": env.observation_space.shape[0],
                    "num_actions": env.action_space.n,
                    "discrete": True
                },
                action_space=env.action_space
            )
        )
    )

    ppo = CrowdPPOptimizer(  # PPO optimizer with full parameter sharing
        agents=agents,  # We're optimizing the agents in the group
        config={}  # Default config
    )

    for _ in trange(10):
        # Collect a batch of data using the current policy
        data_batch, collector_metrics, data_shape = collect_crowd_data(agents=agents, env=env, num_steps=100)

        # Train the current policy on the data, using PPO
        metrics = ppo.train_on_data(data_dict=data_batch, shape=data_shape)

    print(metrics)
    env.close()


```


## Usage

Here we describe the main abstractions and how to actually use this library.

### Configs
One basic unintuitive thing might be the usage of `typarse`. Check it out on [GitHub](https://github.com/RedTachyon/typarse)
to know more, but it's a tool to generate argparsers and configs based on type hints. 
In particular, you can do something like this

```python
from typarse import BaseConfig
from typing import List

class MLPConfig(BaseConfig):
    input_size: int = 0  # Must be set
    num_actions: int = 0  # Must be set
    discrete: bool = None  # Must be set

    activation: str = "leaky_relu"
    sigma0: float = 0.5

    std_head: bool = True

    hidden_sizes: List[int] = [64, 64]

    initializer: str = "kaiming_uniform"

Config: MLPConfig = MLPConfig.clone()

config = {  # Read externally, e.g. from a yaml
    "input_size": 5,
    "num_actions": 2,
    "discrete": True
}

Config.update(config)
```

With this, `Config` gets the values passed to it in `.update()`, and all its values are typed. Neat!

### MultiAgentEnv

An environment is specified in terms of a `coltra.envs.MultiAgentEnv`. It's a rather simple MARL interface
with two unusual class methods - `cls.get_env_creator` and `cls.get_venv`. The first creates a constructor function for the environment and performs
any optional setup that might be necessary. The latter often uses that function and creates a (subprocess) vectorized environment,
with `n` copies of the original environment.

Importantly, **everything** here is multiagent. There is no notion of a single-agent environment - it's just a special case
of a multiagent environment where `num_agents == 1`. This allows us to treat a vectorized environment exactly the same way 
as a regular environment. In a VecEnv, agents have a component in their name describing which of the environments
belong to, e.g. `pursuer_0&env=3`

#### Try it!

A simple interface is using either `MultiGymEnv` for multiagentified Gym environments, or `PettingZooEnv` for PettingZoo envs.

```python
from pettingzoo.sisl import pursuit_v4

from coltra.buffers import Action
from coltra.envs import PettingZooEnv, MultiGymEnv

# env = MultiGymEnv.get_venv(workers=8, env_name="CartPole-v1")  # Creates 8 copies of CartPole

env = PettingZooEnv.get_venv(workers=8, env_fn=pursuit_v4.parallel_env)  # Creates 8 copies of Pursuit, 8 agents each

obs = env.reset()  # Look at the structure of observations

obs, reward, done, info = env.step({agent_id: Action(discrete=env.action_space.sample()) for agent_id in obs})

```

### Observation and Action

This is something that the static-typing/functional-programming nerd in me demanded very loudly. 

Basically, environments always output `Dict[str, Observation]` as observations, and expect `Dict[str, Action]` as actions

`Observation` and `Action` are both defined in `coltra.buffers` and are glorified dataclasses/dictionaries
with some convenience methods. They hold either `np.ndarray`s or `torch.Tensor`s, and perhaps will
be made into explicit generics on that. An Action can hold a continuous action, a discrete action, or a dictionary (not nested) of those.
An Observation can similarly hold a vector or a number of them in a dictionary. There is in principle no difference in how they're
treated, but it allows for multimodal models, e.g. one that receives a vector observation, and raycasts. 
This will (hopefully) make sense when you see how Models are treated.

Note that both Observation and Action can hold either individual values, or batches.

The whole point of this is that now, every environment's output is the same type: `Observation`. This is different
from the usual `gym` model, where the output might be a `np.ndarray` or a `tuple` or a `dict` or who knows what else.
The same is the case for actions.

### Model

Models are the other side of `Observation`, however they don't use `Action` yet. 
A Model inherits from `BaseModel` which in turn inherits from `torch.nn.Module`. It should implement two methods:
`forward(x: Observation, state: Tuple, get_value: bool)` and `value(x: Observation, state: Tuple)`.
Check the detailed return signatures in `coltra.models.BaseModel`, but `forward` should return an action `Distribution`, 
the next recurrent state, and a dictionary with other optional outputs, including value.

**Important note about state** - currently, it's unused and is always an empty tuple. 
It gets carried around to potentially support recurrent policies again, but they're a massive pain, so I'm not sure.
For now just ignore it and always make it an empty tuple.

This is where the `Observation` comes in handy - if you have two types of observations in the environment, e.g. a vector
and an image, you can separately access them with `obs.vector` and `obs.image`.

### Agent

An `Agent` (see: `coltra.agents.Agent`) is the interface between an environment and a model. 
We have two main types of agents: Continuous `CAgent` and discrete `DAgent`. They should be associated with appropriate models.

Conceptually, what the agent does -- it holds a neural network model, accepts an `Observation`, gets an action distribution
from the model, then samples from it in some way, and finally returns the chosen action. 
This happens in the `agent.act` method.

The second important method is `agent.evaluate` which is used during optimization. It takes in a batch of observations and actions,
and returns the respective logprobs, values and entropies, as in all the stuff you need when training. This also properly handles all gradients.

Agents also provide an interface for saving and loading them to the disk.

We also provide several toy agents which take random or constant actions. 
It'd be equally simple to implement an agent performing a specific sequence of actions - I'm sure you see how that can be useful.

Overall this is pretty straight-forward, so check out the implementations for further detail.

One thing that's missing and could be helpful is a mixed continuous-discrete agent for slightly more complex action spaces.
But that's also a relatively rare case since gym doesn't even support that, so it's not here yet.


### MacroAgent

Now we got to the spicy part. Because the environments are multiagent-first, our agents should also be multiagent.
This is handled by the `coltra.groups.MacroAgent` interface. It should conceptually do the same things as an Agent,
except operating on dictionaries of observations/actions.

The simplest case (and the only one implemented at the moment) is `HomogeneousGroup` which is really just 
a thin wrapper around `Agent`, but the interface will make it possible to implement more complex examples.

An important element of a MacroAgent is the `policy_mapping`. In a general MacroAgent, 
you might have several policies which are responsible for different environment agents. 
We do the dispatch based on prefixes. To explain it on an example, a HomogeneousGroup has a simple `policy_mapping`:

```python 
policy_mapping = {"": self.policy_name}
```

Because `""` (empty string) is a prefix of any string, it will match with any agent name, e.g. `pursuer_0`, `evader_1`

Let's say we have two types of agents,`pursuer_x` and `evader_x`, where `x` can be any integer.
We also have two policies, `pursuer` and `evader`. Our policy mapping can then be:

```python
policy_mapping = {"pursuer": "pursuer", "evader": "evader"}
```

Or, if we're being lazy:

```python
policy_mapping = {"p": "pursuer", "e": "evader"}
```

Or even

```python
policy_mapping = {"p": "pursuer", "": "evader"}
```

Each time when we need to match an agent name to a policy name, the group will go through 
all the keys in the policy_mapping, from longest to shortest and see if that key is a prefix of the agent name.
If it is, get that value, otherwise keep searching. If you don't find anything, raise an exception because something's wrong.

This is a relatively new feature, so it still needs to be refined. It assumes that the user Knows What They're Doing (TM), 
so the agents need to be well-named.

RLLib solves the same problem with functions, but functions can't be reliably pickled without some magic. This is simple and works.

#### Try it!

```python
from coltra.models import MLPModel
from coltra.agents import DAgent
from coltra.groups import HomogeneousGroup

from coltra.envs import MultiGymEnv

env = MultiGymEnv.get_venv(8, env_name="CartPole-v1")
model = MLPModel(
    {
        "input_size": env.observation_space.shape[0],
        "num_actions": env.action_space.shape[0],
        "discrete": True
    },
    action_space=env.action_space
)
agent = DAgent(model)
agents = HomogeneousGroup(agent)

# What can you do with model, agent and agents?
```

### Data collection

During any training, we need to collect some data. This is done by the `coltra.collectors.collect_crowd_data` which collects
data with a `HomogeneousGroup` and puts it into a `coltra.buffers.MemoryRecord`. This procedure is pretty simple,
go ahead and check out the code.

The way it works here is that we expect the environment to automatically reset upon completion. 
We collect a fixed number of steps from each of the vectorized envs. and use them for optimization.

### Optimization

This is handled by the `coltra.policy_optimization.CrowdPPOptimizer`. It takes in the data obtained in collection,
and perform gradient updates on the `MacroAgent` with PPO. All the PPO logic is stored in its `train_on_data`, so you don't
need to go through a series of inheritances `PPO -> OnPolicyAlgorithm -> BaseAlgorithm` to know what's going on ;)

### Training

All the components described above are actually everything that you need, see: [At a glance](#at-a-glance). 
But for convenience and proper tensorboard logging, we provide `coltra.trainers.PPOCrowdTrainer` which wraps that logic 
and manages a tensorboard log.

### Scripts

Finally, we have a few scripts that can be used to instantly train or visualize a standard scenario, for example:

```shell
cd scripts
python train_gym.py -c configs/gym_config.yaml -i 500 -e CartPole-v1 -n test_run
python train_pursuit.py -c configs/pursuit_config.yaml -i 500 -n test_run
```

**NOTE:** because nobody other than me ever used this, scripts include logging to my wandb account, which will fail
unless you hack my account. Please don't. You can change it, and in a while I plan to make it managed from the CLI or a file or something.

# Contributing guide

This project is currently *not* encouraging contributions since it's in a volatile state and I need 
to make sure I have a comfortable base that will be somewhat stable and can be built upon.

What I do encourage is feedback -- if something's not clear, or you think could be done better, let me know.
But no promises, since for the moment at least, it's not a community-driven project.

I plan to change this Soon(TM), and if you're reading this, you'll probably be informed about it.

If nevertheless you fell in love with the project and want to help, I have some simple standards:

1. Type hints. Always. Untyped functions scare me.
2. Consistent formatting - just run `black .`
3. Make sure that tests pass. Add new tests when you add new stuff.
4. Keep code clean and readable. Single-variable names are accepted in mathematical parts of the code, nowhere else

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "coltra",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "Ariel Kwiatkowski <ariel.j.kwiatkowski@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/bb/73/44c69b3effbdf43946986668f89d3868f3e789fe30b0d8eb24a21cccbc2d/coltra-0.1.9.tar.gz",
    "platform": null,
    "description": "<!-- # Coltra RL -->\n\n![coltra logo](https://user-images.githubusercontent.com/19414946/139559727-d71caab7-1467-47a5-ac82-acdb9062e85f.png)\n\nFigured I can finally open-source this. \nColtra, a portmanteau of **Col**lect and **tra**in is the RL framework I've been developing for my PhD work due to a frustration with all the other existing libraries.\n\n\n\nAt the time of writing, it only contains an implementation of PPO, although I intend to change that soon. \nAnd if my initial designs were correct, that should prove to be quite easy. Note: the current code\nis tightly connected with my thesis work. After I defend, I might start decoupling it, and we'll see what happens then.\n\nMy main philosophy of coltra is that it should be easy to modify, and easy to access literally any detail of the RL algorithm that you might want. \nFor that reason, I expect that many potential users would even create their own forks, adapting the code to their own needs.\n\nGeneral note about terminology: the context of this project is crowd simulation, so the word \"crowd\" will pop up sometimes.\nYou can assume it basically just refers to \"homogeneous multiagent something with parameter sharing\"\n\n## Another RL framework? Why?\n\nSimple answer - because I wasn't able to use any of the existing ones. Stable Baselines 3 only barely supports multiagent scenarios,\nand is only barely hackable. RLlib is super fast, but a nightmare to modify in any way that deviates from the norm. \nCleanRL is very simple, but components are not reusable at all.\n\nColtra can be thought of as a linear interpolation between CleanRL and SB3, with a focus on multiagent environments.\n\n## Quickstart\n\nProper docs are yet to be written, but here's an outline of how to use this library.\n\n### Installation\n\nThe library was initially written on Python 3.8, and then ported to 3.9. \nWe have no guarantees that it will work on any earlier versions, although it should be easy to make that happen.\n\nThe procedure is the usual\n\n```shell\ngit clone https://github.com/redtachyon/coltra-rl\ncd coltra-rl\npip install -r requirements.txt\npip install -e .\n```\n\nWe particularly invite people to make their forks of the library to implement whatever crazy ideas.\n\n### At a glance\n\n```python\nfrom tqdm import trange\n\nfrom coltra.models import MLPModel\nfrom coltra.agents import DAgent\nfrom coltra.groups import HomogeneousGroup\n\nfrom coltra.envs import MultiGymEnv\nfrom coltra.policy_optimization import CrowdPPOptimizer\nfrom coltra.collectors import collect_crowd_data\n\nif __name__ == '__main__':\n    env = MultiGymEnv.get_venv(8, env_name=\"CartPole-v1\")\n\n    agents = HomogeneousGroup(  # Parameter-shared group agent\n        DAgent(  # Individual Discrete Agent\n            MLPModel(  # Policy and Value neural network\n                {\n                    \"input_size\": env.observation_space.shape[0],\n                    \"num_actions\": env.action_space.n,\n                    \"discrete\": True\n                },\n                action_space=env.action_space\n            )\n        )\n    )\n\n    ppo = CrowdPPOptimizer(  # PPO optimizer with full parameter sharing\n        agents=agents,  # We're optimizing the agents in the group\n        config={}  # Default config\n    )\n\n    for _ in trange(10):\n        # Collect a batch of data using the current policy\n        data_batch, collector_metrics, data_shape = collect_crowd_data(agents=agents, env=env, num_steps=100)\n\n        # Train the current policy on the data, using PPO\n        metrics = ppo.train_on_data(data_dict=data_batch, shape=data_shape)\n\n    print(metrics)\n    env.close()\n\n\n```\n\n\n## Usage\n\nHere we describe the main abstractions and how to actually use this library.\n\n### Configs\nOne basic unintuitive thing might be the usage of `typarse`. Check it out on [GitHub](https://github.com/RedTachyon/typarse)\nto know more, but it's a tool to generate argparsers and configs based on type hints. \nIn particular, you can do something like this\n\n```python\nfrom typarse import BaseConfig\nfrom typing import List\n\nclass MLPConfig(BaseConfig):\n    input_size: int = 0  # Must be set\n    num_actions: int = 0  # Must be set\n    discrete: bool = None  # Must be set\n\n    activation: str = \"leaky_relu\"\n    sigma0: float = 0.5\n\n    std_head: bool = True\n\n    hidden_sizes: List[int] = [64, 64]\n\n    initializer: str = \"kaiming_uniform\"\n\nConfig: MLPConfig = MLPConfig.clone()\n\nconfig = {  # Read externally, e.g. from a yaml\n    \"input_size\": 5,\n    \"num_actions\": 2,\n    \"discrete\": True\n}\n\nConfig.update(config)\n```\n\nWith this, `Config` gets the values passed to it in `.update()`, and all its values are typed. Neat!\n\n### MultiAgentEnv\n\nAn environment is specified in terms of a `coltra.envs.MultiAgentEnv`. It's a rather simple MARL interface\nwith two unusual class methods - `cls.get_env_creator` and `cls.get_venv`. The first creates a constructor function for the environment and performs\nany optional setup that might be necessary. The latter often uses that function and creates a (subprocess) vectorized environment,\nwith `n` copies of the original environment.\n\nImportantly, **everything** here is multiagent. There is no notion of a single-agent environment - it's just a special case\nof a multiagent environment where `num_agents == 1`. This allows us to treat a vectorized environment exactly the same way \nas a regular environment. In a VecEnv, agents have a component in their name describing which of the environments\nbelong to, e.g. `pursuer_0&env=3`\n\n#### Try it!\n\nA simple interface is using either `MultiGymEnv` for multiagentified Gym environments, or `PettingZooEnv` for PettingZoo envs.\n\n```python\nfrom pettingzoo.sisl import pursuit_v4\n\nfrom coltra.buffers import Action\nfrom coltra.envs import PettingZooEnv, MultiGymEnv\n\n# env = MultiGymEnv.get_venv(workers=8, env_name=\"CartPole-v1\")  # Creates 8 copies of CartPole\n\nenv = PettingZooEnv.get_venv(workers=8, env_fn=pursuit_v4.parallel_env)  # Creates 8 copies of Pursuit, 8 agents each\n\nobs = env.reset()  # Look at the structure of observations\n\nobs, reward, done, info = env.step({agent_id: Action(discrete=env.action_space.sample()) for agent_id in obs})\n\n```\n\n### Observation and Action\n\nThis is something that the static-typing/functional-programming nerd in me demanded very loudly. \n\nBasically, environments always output `Dict[str, Observation]` as observations, and expect `Dict[str, Action]` as actions\n\n`Observation` and `Action` are both defined in `coltra.buffers` and are glorified dataclasses/dictionaries\nwith some convenience methods. They hold either `np.ndarray`s or `torch.Tensor`s, and perhaps will\nbe made into explicit generics on that. An Action can hold a continuous action, a discrete action, or a dictionary (not nested) of those.\nAn Observation can similarly hold a vector or a number of them in a dictionary. There is in principle no difference in how they're\ntreated, but it allows for multimodal models, e.g. one that receives a vector observation, and raycasts. \nThis will (hopefully) make sense when you see how Models are treated.\n\nNote that both Observation and Action can hold either individual values, or batches.\n\nThe whole point of this is that now, every environment's output is the same type: `Observation`. This is different\nfrom the usual `gym` model, where the output might be a `np.ndarray` or a `tuple` or a `dict` or who knows what else.\nThe same is the case for actions.\n\n### Model\n\nModels are the other side of `Observation`, however they don't use `Action` yet. \nA Model inherits from `BaseModel` which in turn inherits from `torch.nn.Module`. It should implement two methods:\n`forward(x: Observation, state: Tuple, get_value: bool)` and `value(x: Observation, state: Tuple)`.\nCheck the detailed return signatures in `coltra.models.BaseModel`, but `forward` should return an action `Distribution`, \nthe next recurrent state, and a dictionary with other optional outputs, including value.\n\n**Important note about state** - currently, it's unused and is always an empty tuple. \nIt gets carried around to potentially support recurrent policies again, but they're a massive pain, so I'm not sure.\nFor now just ignore it and always make it an empty tuple.\n\nThis is where the `Observation` comes in handy - if you have two types of observations in the environment, e.g. a vector\nand an image, you can separately access them with `obs.vector` and `obs.image`.\n\n### Agent\n\nAn `Agent` (see: `coltra.agents.Agent`) is the interface between an environment and a model. \nWe have two main types of agents: Continuous `CAgent` and discrete `DAgent`. They should be associated with appropriate models.\n\nConceptually, what the agent does -- it holds a neural network model, accepts an `Observation`, gets an action distribution\nfrom the model, then samples from it in some way, and finally returns the chosen action. \nThis happens in the `agent.act` method.\n\nThe second important method is `agent.evaluate` which is used during optimization. It takes in a batch of observations and actions,\nand returns the respective logprobs, values and entropies, as in all the stuff you need when training. This also properly handles all gradients.\n\nAgents also provide an interface for saving and loading them to the disk.\n\nWe also provide several toy agents which take random or constant actions. \nIt'd be equally simple to implement an agent performing a specific sequence of actions - I'm sure you see how that can be useful.\n\nOverall this is pretty straight-forward, so check out the implementations for further detail.\n\nOne thing that's missing and could be helpful is a mixed continuous-discrete agent for slightly more complex action spaces.\nBut that's also a relatively rare case since gym doesn't even support that, so it's not here yet.\n\n\n### MacroAgent\n\nNow we got to the spicy part. Because the environments are multiagent-first, our agents should also be multiagent.\nThis is handled by the `coltra.groups.MacroAgent` interface. It should conceptually do the same things as an Agent,\nexcept operating on dictionaries of observations/actions.\n\nThe simplest case (and the only one implemented at the moment) is `HomogeneousGroup` which is really just \na thin wrapper around `Agent`, but the interface will make it possible to implement more complex examples.\n\nAn important element of a MacroAgent is the `policy_mapping`. In a general MacroAgent, \nyou might have several policies which are responsible for different environment agents. \nWe do the dispatch based on prefixes. To explain it on an example, a HomogeneousGroup has a simple `policy_mapping`:\n\n```python \npolicy_mapping = {\"\": self.policy_name}\n```\n\nBecause `\"\"` (empty string) is a prefix of any string, it will match with any agent name, e.g. `pursuer_0`, `evader_1`\n\nLet's say we have two types of agents,`pursuer_x` and `evader_x`, where `x` can be any integer.\nWe also have two policies, `pursuer` and `evader`. Our policy mapping can then be:\n\n```python\npolicy_mapping = {\"pursuer\": \"pursuer\", \"evader\": \"evader\"}\n```\n\nOr, if we're being lazy:\n\n```python\npolicy_mapping = {\"p\": \"pursuer\", \"e\": \"evader\"}\n```\n\nOr even\n\n```python\npolicy_mapping = {\"p\": \"pursuer\", \"\": \"evader\"}\n```\n\nEach time when we need to match an agent name to a policy name, the group will go through \nall the keys in the policy_mapping, from longest to shortest and see if that key is a prefix of the agent name.\nIf it is, get that value, otherwise keep searching. If you don't find anything, raise an exception because something's wrong.\n\nThis is a relatively new feature, so it still needs to be refined. It assumes that the user Knows What They're Doing (TM), \nso the agents need to be well-named.\n\nRLLib solves the same problem with functions, but functions can't be reliably pickled without some magic. This is simple and works.\n\n#### Try it!\n\n```python\nfrom coltra.models import MLPModel\nfrom coltra.agents import DAgent\nfrom coltra.groups import HomogeneousGroup\n\nfrom coltra.envs import MultiGymEnv\n\nenv = MultiGymEnv.get_venv(8, env_name=\"CartPole-v1\")\nmodel = MLPModel(\n    {\n        \"input_size\": env.observation_space.shape[0],\n        \"num_actions\": env.action_space.shape[0],\n        \"discrete\": True\n    },\n    action_space=env.action_space\n)\nagent = DAgent(model)\nagents = HomogeneousGroup(agent)\n\n# What can you do with model, agent and agents?\n```\n\n### Data collection\n\nDuring any training, we need to collect some data. This is done by the `coltra.collectors.collect_crowd_data` which collects\ndata with a `HomogeneousGroup` and puts it into a `coltra.buffers.MemoryRecord`. This procedure is pretty simple,\ngo ahead and check out the code.\n\nThe way it works here is that we expect the environment to automatically reset upon completion. \nWe collect a fixed number of steps from each of the vectorized envs. and use them for optimization.\n\n### Optimization\n\nThis is handled by the `coltra.policy_optimization.CrowdPPOptimizer`. It takes in the data obtained in collection,\nand perform gradient updates on the `MacroAgent` with PPO. All the PPO logic is stored in its `train_on_data`, so you don't\nneed to go through a series of inheritances `PPO -> OnPolicyAlgorithm -> BaseAlgorithm` to know what's going on ;)\n\n### Training\n\nAll the components described above are actually everything that you need, see: [At a glance](#at-a-glance). \nBut for convenience and proper tensorboard logging, we provide `coltra.trainers.PPOCrowdTrainer` which wraps that logic \nand manages a tensorboard log.\n\n### Scripts\n\nFinally, we have a few scripts that can be used to instantly train or visualize a standard scenario, for example:\n\n```shell\ncd scripts\npython train_gym.py -c configs/gym_config.yaml -i 500 -e CartPole-v1 -n test_run\npython train_pursuit.py -c configs/pursuit_config.yaml -i 500 -n test_run\n```\n\n**NOTE:** because nobody other than me ever used this, scripts include logging to my wandb account, which will fail\nunless you hack my account. Please don't. You can change it, and in a while I plan to make it managed from the CLI or a file or something.\n\n# Contributing guide\n\nThis project is currently *not* encouraging contributions since it's in a volatile state and I need \nto make sure I have a comfortable base that will be somewhat stable and can be built upon.\n\nWhat I do encourage is feedback -- if something's not clear, or you think could be done better, let me know.\nBut no promises, since for the moment at least, it's not a community-driven project.\n\nI plan to change this Soon(TM), and if you're reading this, you'll probably be informed about it.\n\nIf nevertheless you fell in love with the project and want to help, I have some simple standards:\n\n1. Type hints. Always. Untyped functions scare me.\n2. Consistent formatting - just run `black .`\n3. Make sure that tests pass. Add new tests when you add new stuff.\n4. Keep code clean and readable. Single-variable names are accepted in mathematical parts of the code, nowhere else\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Coltra is a simple moddable RL algorithm implementation",
    "version": "0.1.9",
    "project_urls": {
        "Homepage": "https://github.com/redtachyon/coltra-rl"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "61be304872ff5c5125f58b0406e054477f1c2668889896a4fc92ed938b56b231",
                "md5": "0cfd2203f2fb13d2ce93ba60ef582040",
                "sha256": "6d281a9f70a27a8cb818c9901a8cdbf9d6d7e7011360a6a22444a464831af264"
            },
            "downloads": -1,
            "filename": "coltra-0.1.9-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0cfd2203f2fb13d2ce93ba60ef582040",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 92692,
            "upload_time": "2023-09-09T23:04:41",
            "upload_time_iso_8601": "2023-09-09T23:04:41.166282Z",
            "url": "https://files.pythonhosted.org/packages/61/be/304872ff5c5125f58b0406e054477f1c2668889896a4fc92ed938b56b231/coltra-0.1.9-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "bb7344c69b3effbdf43946986668f89d3868f3e789fe30b0d8eb24a21cccbc2d",
                "md5": "a0a14a9ddc2db64d99dfee44cccfba82",
                "sha256": "d16a78bb1c42eaf90b06dda83751ac9b20e28eb3aac216104bdc95a49b486061"
            },
            "downloads": -1,
            "filename": "coltra-0.1.9.tar.gz",
            "has_sig": false,
            "md5_digest": "a0a14a9ddc2db64d99dfee44cccfba82",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 71390,
            "upload_time": "2023-09-09T23:04:39",
            "upload_time_iso_8601": "2023-09-09T23:04:39.432469Z",
            "url": "https://files.pythonhosted.org/packages/bb/73/44c69b3effbdf43946986668f89d3868f3e789fe30b0d8eb24a21cccbc2d/coltra-0.1.9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-09 23:04:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "redtachyon",
    "github_project": "coltra-rl",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "typarse",
            "specs": [
                [
                    "==",
                    "3.4.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "1.23.5"
                ]
            ]
        },
        {
            "name": "gymnasium",
            "specs": [
                [
                    "~=",
                    "0.29.1"
                ]
            ]
        },
        {
            "name": "jupyter",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "jupyterlab",
            "specs": [
                [
                    ">=",
                    "3.5.3"
                ]
            ]
        },
        {
            "name": "mlagents-envs",
            "specs": [
                [
                    "~=",
                    "0.28.0"
                ]
            ]
        },
        {
            "name": "PyYAML",
            "specs": [
                [
                    ">=",
                    "6.0"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "1.10.0"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    "==",
                    "3.6.3"
                ]
            ]
        },
        {
            "name": "seaborn",
            "specs": [
                [
                    ">=",
                    "0.12.2"
                ]
            ]
        },
        {
            "name": "tensorboard",
            "specs": [
                [
                    ">=",
                    "2.11.2"
                ]
            ]
        },
        {
            "name": "torch",
            "specs": [
                [
                    ">=",
                    "1.13.1"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    ">=",
                    "4.64.1"
                ]
            ]
        },
        {
            "name": "ipykernel",
            "specs": [
                [
                    ">=",
                    "6.20.2"
                ]
            ]
        },
        {
            "name": "numba",
            "specs": [
                [
                    ">=",
                    "0.56.4"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    ">=",
                    "7.2.1"
                ]
            ]
        },
        {
            "name": "coverage",
            "specs": [
                [
                    ">=",
                    "7.1.0"
                ]
            ]
        },
        {
            "name": "wandb",
            "specs": [
                [
                    ">=",
                    "0.13.9"
                ]
            ]
        },
        {
            "name": "pybullet",
            "specs": [
                [
                    "==",
                    "3.2.5"
                ]
            ]
        },
        {
            "name": "opencv-python",
            "specs": [
                [
                    "~=",
                    "4.7.0.68"
                ]
            ]
        },
        {
            "name": "PettingZoo",
            "specs": [
                [
                    ">=",
                    "1.22.3"
                ]
            ]
        },
        {
            "name": "supersuit",
            "specs": [
                [
                    ">=",
                    "3.7.1"
                ]
            ]
        },
        {
            "name": "cloudpickle",
            "specs": [
                [
                    "~=",
                    "2.2.1"
                ]
            ]
        },
        {
            "name": "pillow",
            "specs": [
                [
                    "~=",
                    "9.4.0"
                ]
            ]
        },
        {
            "name": "setuptools",
            "specs": [
                [
                    "~=",
                    "65.6.3"
                ]
            ]
        },
        {
            "name": "pyvirtualdisplay",
            "specs": [
                [
                    "~=",
                    "3.0"
                ]
            ]
        },
        {
            "name": "optuna",
            "specs": [
                [
                    "~=",
                    "3.1.0"
                ]
            ]
        },
        {
            "name": "pytype",
            "specs": [
                [
                    "==",
                    "2023.1.17"
                ]
            ]
        },
        {
            "name": "jax",
            "specs": [
                [
                    "==",
                    "0.4.1"
                ]
            ]
        },
        {
            "name": "jaxlib",
            "specs": [
                [
                    "==",
                    "0.4.1"
                ]
            ]
        },
        {
            "name": "shortuuid",
            "specs": [
                [
                    "==",
                    "1.0.11"
                ]
            ]
        },
        {
            "name": "black",
            "specs": [
                [
                    "==",
                    "22.3.0"
                ]
            ]
        },
        {
            "name": "protobuf",
            "specs": [
                [
                    "~=",
                    "3.20"
                ]
            ]
        }
    ],
    "lcname": "coltra"
}
        
Elapsed time: 0.13535s