msdm

Name	msdm JSON
Version	0.11 JSON
	download
home_page	https://github.com/markkho/msdm
Summary	Models of sequential decision making
upload_time	2023-10-17 22:21:49
maintainer
docs_url	None
author	Mark Ho
requires_python
license	MIT
keywords	reinforcement learning planning cognitive science
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # `msdm`: Models of Sequential Decision-Making

## Goals
`msdm` aims to simplify the design and evaluation of
models of sequential decision-making. The library
can be used for cognitive science or computer
science research/teaching.

## Approach
`msdm` provides standardized interfaces and implementations
for common constructs in sequential
decision-making. This includes algorithms used in single-agent
[reinforcement learning](https://en.wikipedia.org/wiki/Reinforcement_learning)
as well as those used in
[planning](https://en.wikipedia.org/wiki/Automated_planning_and_scheduling),
[partially observable environments](https://en.wikipedia.org/wiki/Partially_observable_Markov_decision_process),
and [multi-agent games](https://en.wikipedia.org/wiki/Stochastic_game).

The library is organized around different **problem classes**
and **algorithms** that operate on **problem instances**.
We take inspiration from existing libraries such as
[scikit-learn](https://scikit-learn.org/) that
enable users to transparently mix and match components.
For instance, a standard way to define a problem, solve it,
and examine the results would be:

```
# create a problem instance
mdp = make_russell_norvig_grid(
    discount_rate=0.95,
    slip_prob=0.8,
)

# solve the problem
vi = ValueIteration()
res = vi.plan_on(mdp)

# print the value function
print(res.V)
```

The library is under active development. Currently,
we support the following problem classes:

- Markov Decision Processes (MDPs)
- Partially Observable Markov Decision Processes (POMDPs)
- Markov Games
- Partially Observable Stochastic Games (POSGs)

The following algorithms have been implemented and
tested:

- Classical Planning
    - Breadth-First Search (Zuse, 1945)
    - A* (Hart, Nilsson & Raphael, 1968)
- Stochastic Planning
    - Value Iteration (Bellman, 1957)
    - Policy Iteration (Howard, 1960)
    - Labeled Real-time Dynamic Programming ([Bonet & Geffner, 2003](https://www.aaai.org/Papers/ICAPS/2003/ICAPS03-002.pdf))
    - LAO* ([Hansen & Zilberstein, 2003](https://www.sciencedirect.com/science/article/pii/S0004370201001060))
- Partially Observable Planning
    - QMDP ([Littman, Cassandra & Kaelbling, 1995](https://www.sciencedirect.com/science/article/pii/B9781558603776500529))
    - Point-based Value-Iteration ([Pineau, Gordon & Thrun, 2003](https://dl.acm.org/doi/abs/10.5555/1630659.1630806))
    - Finite state controller gradient ascent ([Meuleau, Kim, Kaelbling & Cassandra, 1999](https://arxiv.org/abs/1301.6720))
    - Bounded finite state controller policy iteration ([Poupart & Boutilier, 2003](https://dl.acm.org/doi/abs/10.5555/2981345.2981448))
    - Wrappers for [POMDPs.jl](https://juliapomdp.github.io/POMDPs.jl/latest/) solvers (requires Julia installation)
- Reinforcement Learning
    - Q-Learning (Watkins, 1992)
    - Double Q-Learning ([van Hasselt, 2010](https://proceedings.neurips.cc/paper/2010/hash/091d584fced301b442654dd8c23b3fc9-Abstract.html))
    - SARSA ([Rummery & Niranjan, 1994](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.17.2539&rep=rep1&type=pdf))
    - Expected SARSA ([van Seijen, van Hasselt, Whiteson & Wiering, 2009](https://ieeexplore.ieee.org/abstract/document/4927542))
    - R-MAX ([Brafman & Tennenholtz, 2002](https://www.jmlr.org/papers/volume3/brafman02a/brafman02a.pdf))
- Multi-agent Reinforcement Learning (in progress)
    - Correlated Q Learning ([Greenwald & Hall, 2002](https://dl.acm.org/doi/abs/10.5555/3041838.3041869))
    - Nash Q Learning ([Hu & Wellman, 2003](https://dl.acm.org/doi/abs/10.5555/945365.964288))
    - Friend/Foe Q Learning ([Littman, 2001](https://dl.acm.org/doi/abs/10.5555/645530.655661))

We aim to add implementations for other algorithms in the
near future (e.g., inverse RL, deep learning, multi-agent learning and planning).

# Installation

It is recommended to use a [virtual environment](https://virtualenv.pypa.io/en/latest/index.html).

## Installing from pip

```bash
$ pip install msdm
```

## Installing from GitHub
```bash
$ pip install --upgrade git+https://github.com/markkho/msdm.git
```

## Installing the package in edit mode

After downloading, go into the folder and install the package locally
(with a symlink so its updated as source file changes are made):

```bash
$ pip install -e .
```

# Contributing

We welcome contributions in the form of implementations of
algorithms for common problem classes that are
well-documented in the literature. Please first
post an issue and/or
reach out to <mark.ho.cs@gmail.com>
to check if a proposed contribution is within the
scope of the library.

## Running tests, etc.

To run all tests: `make test`

To run tests for some file: `python -m py.test msdm/tests/$TEST_FILE_NAME.py`

To lint the code: `make lint`

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/markkho/msdm",
    "name": "msdm",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "reinforcement learning,planning,cognitive science",
    "author": "Mark Ho",
    "author_email": "mark.ho.cs@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/f6/cd/ab6b38a33e60f3b9c78d76880ff630734a962263e7e2295306e0795e3748/msdm-0.11.tar.gz",
    "platform": null,
    "description": "# `msdm`: Models of Sequential Decision-Making\n\n## Goals\n`msdm` aims to simplify the design and evaluation of\nmodels of sequential decision-making. The library\ncan be used for cognitive science or computer\nscience research/teaching.\n\n## Approach\n`msdm` provides standardized interfaces and implementations\nfor common constructs in sequential\ndecision-making. This includes algorithms used in single-agent\n[reinforcement learning](https://en.wikipedia.org/wiki/Reinforcement_learning)\nas well as those used in\n[planning](https://en.wikipedia.org/wiki/Automated_planning_and_scheduling),\n[partially observable environments](https://en.wikipedia.org/wiki/Partially_observable_Markov_decision_process),\nand [multi-agent games](https://en.wikipedia.org/wiki/Stochastic_game).\n\nThe library is organized around different **problem classes**\nand **algorithms** that operate on **problem instances**.\nWe take inspiration from existing libraries such as\n[scikit-learn](https://scikit-learn.org/) that\nenable users to transparently mix and match components.\nFor instance, a standard way to define a problem, solve it,\nand examine the results would be:\n\n```\n# create a problem instance\nmdp = make_russell_norvig_grid(\n    discount_rate=0.95,\n    slip_prob=0.8,\n)\n\n# solve the problem\nvi = ValueIteration()\nres = vi.plan_on(mdp)\n\n# print the value function\nprint(res.V)\n```\n\nThe library is under active development. Currently,\nwe support the following problem classes:\n\n- Markov Decision Processes (MDPs)\n- Partially Observable Markov Decision Processes (POMDPs)\n- Markov Games\n- Partially Observable Stochastic Games (POSGs)\n\nThe following algorithms have been implemented and\ntested:\n\n- Classical Planning\n    - Breadth-First Search (Zuse, 1945)\n    - A* (Hart, Nilsson & Raphael, 1968)\n- Stochastic Planning\n    - Value Iteration (Bellman, 1957)\n    - Policy Iteration (Howard, 1960)\n    - Labeled Real-time Dynamic Programming ([Bonet & Geffner, 2003](https://www.aaai.org/Papers/ICAPS/2003/ICAPS03-002.pdf))\n    - LAO* ([Hansen & Zilberstein, 2003](https://www.sciencedirect.com/science/article/pii/S0004370201001060))\n- Partially Observable Planning\n    - QMDP ([Littman, Cassandra & Kaelbling, 1995](https://www.sciencedirect.com/science/article/pii/B9781558603776500529))\n    - Point-based Value-Iteration ([Pineau, Gordon & Thrun, 2003](https://dl.acm.org/doi/abs/10.5555/1630659.1630806))\n    - Finite state controller gradient ascent ([Meuleau, Kim, Kaelbling & Cassandra, 1999](https://arxiv.org/abs/1301.6720))\n    - Bounded finite state controller policy iteration ([Poupart & Boutilier, 2003](https://dl.acm.org/doi/abs/10.5555/2981345.2981448))\n    - Wrappers for [POMDPs.jl](https://juliapomdp.github.io/POMDPs.jl/latest/) solvers (requires Julia installation)\n- Reinforcement Learning\n    - Q-Learning (Watkins, 1992)\n    - Double Q-Learning ([van Hasselt, 2010](https://proceedings.neurips.cc/paper/2010/hash/091d584fced301b442654dd8c23b3fc9-Abstract.html))\n    - SARSA ([Rummery & Niranjan, 1994](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.17.2539&rep=rep1&type=pdf))\n    - Expected SARSA ([van Seijen, van Hasselt, Whiteson & Wiering, 2009](https://ieeexplore.ieee.org/abstract/document/4927542))\n    - R-MAX ([Brafman & Tennenholtz, 2002](https://www.jmlr.org/papers/volume3/brafman02a/brafman02a.pdf))\n- Multi-agent Reinforcement Learning (in progress)\n    - Correlated Q Learning ([Greenwald & Hall, 2002](https://dl.acm.org/doi/abs/10.5555/3041838.3041869))\n    - Nash Q Learning ([Hu & Wellman, 2003](https://dl.acm.org/doi/abs/10.5555/945365.964288))\n    - Friend/Foe Q Learning ([Littman, 2001](https://dl.acm.org/doi/abs/10.5555/645530.655661))\n\nWe aim to add implementations for other algorithms in the\nnear future (e.g., inverse RL, deep learning, multi-agent learning and planning).\n\n# Installation\n\nIt is recommended to use a [virtual environment](https://virtualenv.pypa.io/en/latest/index.html).\n\n## Installing from pip\n\n```bash\n$ pip install msdm\n```\n\n## Installing from GitHub\n```bash\n$ pip install --upgrade git+https://github.com/markkho/msdm.git\n```\n\n## Installing the package in edit mode\n\nAfter downloading, go into the folder and install the package locally\n(with a symlink so its updated as source file changes are made):\n\n```bash\n$ pip install -e .\n```\n\n# Contributing\n\nWe welcome contributions in the form of implementations of\nalgorithms for common problem classes that are\nwell-documented in the literature. Please first\npost an issue and/or\nreach out to <mark.ho.cs@gmail.com>\nto check if a proposed contribution is within the\nscope of the library.\n\n## Running tests, etc.\n\nTo run all tests: `make test`\n\nTo run tests for some file: `python -m py.test msdm/tests/$TEST_FILE_NAME.py`\n\nTo lint the code: `make lint`",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Models of sequential decision making",
    "version": "0.11",
    "project_urls": {
        "Download": "https://github.com/markkho/msdm/archive/refs/tags/v0.11.tar.gz",
        "Homepage": "https://github.com/markkho/msdm"
    },
    "split_keywords": [
        "reinforcement learning",
        "planning",
        "cognitive science"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f6cdab6b38a33e60f3b9c78d76880ff630734a962263e7e2295306e0795e3748",
                "md5": "04d2571300cb59922bd6eb626d013fad",
                "sha256": "17159a7a6d2fe503bee7e41bba86213b77925fad10dc7c3d5c4c090715b50bf6"
            },
            "downloads": -1,
            "filename": "msdm-0.11.tar.gz",
            "has_sig": false,
            "md5_digest": "04d2571300cb59922bd6eb626d013fad",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 130035,
            "upload_time": "2023-10-17T22:21:49",
            "upload_time_iso_8601": "2023-10-17T22:21:49.229198Z",
            "url": "https://files.pythonhosted.org/packages/f6/cd/ab6b38a33e60f3b9c78d76880ff630734a962263e7e2295306e0795e3748/msdm-0.11.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-17 22:21:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "markkho",
    "github_project": "msdm",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "msdm"
}

Mark Ho