matrix-mdp-gym


Namematrix-mdp-gym JSON
Version 1.1.1 PyPI version JSON
download
home_pagehttps://github.com/Paul-543NA/matrix-mdp-gym
SummaryAn OpenAI gym / Gymnasium environment to seamlessly create discrete MDPs from matrices.
upload_time2023-02-02 11:32:06
maintainer
docs_urlNone
authorPaul Festor
requires_python>=3.6
licenseMIT License
keywords reinforcement-learning reinforcement-learning-environment gym-environment markov-decision-processes gym openai-gym gymnasium
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Matrix MDP
[![Downloads](https://pepy.tech/badge/matrix-mdp-gym)](https://pepy.tech/project/matrix-mdp-gym)

Easily generate an MDP from transition and reward matricies.

Want to learn more on the story behind this repo? Check the blog post [here](https://www.paul-festor.com/post/i-created-a-python-library)!


## Installation
Assuming you are in the root directory of the project, run the following command:
```bash
pip install matrix-mdp-gym
```

## Usage
```python
import gymnasium as gym
import matrix_mdp
env = gym.make('matrix_mdp/MatrixMDP-v0')
```

## Environment documentation

### Description

A flexible environment to have a gym API for discrete MDPs with `N_s` states and `N_a` actions given:
 - A vector of initial state distribution vector P_0(S)
 - A transition probability matrix P(S' | S, A)
 - A reward matrix R(S', S, A) of the reward for reaching S' after having taken action A in state S

### Action Space

The action is a `ndarray` with shape `(1,)` representing the index of the action to execute.

### Observation Space

The observation is a `ndarray` with shape `(1,)` representing index of the state the agent is in.

### Rewards

The reward function is defined according to the reward matrix given at the creation of the environment.

### Starting State

The starting state is a random state sampled from $P_0$.

### Episode Truncation

The episode truncates when a terminal state is reached.
Terminal states are inferred from the transition probability matrix as
$\sum_{s' \in S} \sum_{s \in S} \sum_{a \in A} P(s' | s, a) = 0$

### Arguments

- `p_à`: `ndarray` of shape `(n_states, )` representing the initial state probability distribution.
- `p`: `ndarray` of shape `(n_states, n_states, n_actions)` representing the transition dynamics $P(S' | S, A)$.
- `r`: `ndarray` of shape `(n_states, n_states, n_actions)` representing the reward matrix.

```python
import gymnasium as gym
import matrix_mdp

gym.make('MatrixMDP-v0', p_0=p_0, p=p, r=r)
```

### Version History

* `v0`: Initial versions release

## Acknowledgements

Thanks to [Will Dudley](https://github.com/WillDudley) for his help on learning how to put a Python package together/

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Paul-543NA/matrix-mdp-gym",
    "name": "matrix-mdp-gym",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "Reinforcement-Learning Reinforcement-Learning-Environment Gym-Environment Markov-Decision-Processes Gym OpenAI-Gym Gymnasium",
    "author": "Paul Festor",
    "author_email": "paul.festor@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/da/9f/126c5807efcac8cd4b0b09ab907c5437ccc76eb01c749a0a3b1e0ed97fe5/matrix-mdp-gym-1.1.1.tar.gz",
    "platform": null,
    "description": "# Matrix MDP\n[![Downloads](https://pepy.tech/badge/matrix-mdp-gym)](https://pepy.tech/project/matrix-mdp-gym)\n\nEasily generate an MDP from transition and reward matricies.\n\nWant to learn more on the story behind this repo? Check the blog post [here](https://www.paul-festor.com/post/i-created-a-python-library)!\n\n\n## Installation\nAssuming you are in the root directory of the project, run the following command:\n```bash\npip install matrix-mdp-gym\n```\n\n## Usage\n```python\nimport gymnasium as gym\nimport matrix_mdp\nenv = gym.make('matrix_mdp/MatrixMDP-v0')\n```\n\n## Environment documentation\n\n### Description\n\nA flexible environment to have a gym API for discrete MDPs with `N_s` states and `N_a` actions given:\n - A vector of initial state distribution vector P_0(S)\n - A transition probability matrix P(S' | S, A)\n - A reward matrix R(S', S, A) of the reward for reaching S' after having taken action A in state S\n\n### Action Space\n\nThe action is a `ndarray` with shape `(1,)` representing the index of the action to execute.\n\n### Observation Space\n\nThe observation is a `ndarray` with shape `(1,)` representing index of the state the agent is in.\n\n### Rewards\n\nThe reward function is defined according to the reward matrix given at the creation of the environment.\n\n### Starting State\n\nThe starting state is a random state sampled from $P_0$.\n\n### Episode Truncation\n\nThe episode truncates when a terminal state is reached.\nTerminal states are inferred from the transition probability matrix as\n$\\sum_{s' \\in S} \\sum_{s \\in S} \\sum_{a \\in A} P(s' | s, a) = 0$\n\n### Arguments\n\n- `p_\u00e0`: `ndarray` of shape `(n_states, )` representing the initial state probability distribution.\n- `p`: `ndarray` of shape `(n_states, n_states, n_actions)` representing the transition dynamics $P(S' | S, A)$.\n- `r`: `ndarray` of shape `(n_states, n_states, n_actions)` representing the reward matrix.\n\n```python\nimport gymnasium as gym\nimport matrix_mdp\n\ngym.make('MatrixMDP-v0', p_0=p_0, p=p, r=r)\n```\n\n### Version History\n\n* `v0`: Initial versions release\n\n## Acknowledgements\n\nThanks to [Will Dudley](https://github.com/WillDudley) for his help on learning how to put a Python package together/\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "An OpenAI gym / Gymnasium environment to seamlessly create discrete MDPs from matrices.",
    "version": "1.1.1",
    "split_keywords": [
        "reinforcement-learning",
        "reinforcement-learning-environment",
        "gym-environment",
        "markov-decision-processes",
        "gym",
        "openai-gym",
        "gymnasium"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d359b599f21b2db76259549fe3599327e2aa5aee6145a9be56e7e2d8aacaa40c",
                "md5": "3715743d5c78802e54ec330ef482e448",
                "sha256": "a647ce63d11ad8bf9391a9816f1358fbb13100a1f733c6bd40b908062b581177"
            },
            "downloads": -1,
            "filename": "matrix_mdp_gym-1.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3715743d5c78802e54ec330ef482e448",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 6002,
            "upload_time": "2023-02-02T11:32:04",
            "upload_time_iso_8601": "2023-02-02T11:32:04.320622Z",
            "url": "https://files.pythonhosted.org/packages/d3/59/b599f21b2db76259549fe3599327e2aa5aee6145a9be56e7e2d8aacaa40c/matrix_mdp_gym-1.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "da9f126c5807efcac8cd4b0b09ab907c5437ccc76eb01c749a0a3b1e0ed97fe5",
                "md5": "c542cea29c6db526ed3d95498d3cb2ae",
                "sha256": "a1e3e2ec1805f5e12c5f2d774e95e7acc3e133a69f0c690bb659c6807fa8bd08"
            },
            "downloads": -1,
            "filename": "matrix-mdp-gym-1.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "c542cea29c6db526ed3d95498d3cb2ae",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 5333,
            "upload_time": "2023-02-02T11:32:06",
            "upload_time_iso_8601": "2023-02-02T11:32:06.509086Z",
            "url": "https://files.pythonhosted.org/packages/da/9f/126c5807efcac8cd4b0b09ab907c5437ccc76eb01c749a0a3b1e0ed97fe5/matrix-mdp-gym-1.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-02-02 11:32:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "Paul-543NA",
    "github_project": "matrix-mdp-gym",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "matrix-mdp-gym"
}
        
Elapsed time: 0.04080s