opof-pomdp


Nameopof-pomdp JSON
Version 0.2.0 PyPI version JSON
download
home_pagehttps://github.com/opoframework/pomdp
SummaryOPOF domains for online POMDP planning
upload_time2023-03-12 22:13:24
maintainer
docs_urlNone
authorYiyuan Lee
requires_python>=3.9
licenseBSD-3
keywords opof optimization machine learning reinforcement learning planning pomdp uncertainty robotics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # opof-pomdp

[OPOF](https://github.com/opoframework/opof) online POMDP planning domains for 2D navigation under uncertainty. They include the optimization of macro-actions.



[![Build and Test](https://github.com/opoframework/opof-pomdp/actions/workflows/build_and_test.yml/badge.svg)](https://github.com/opoframework/opof-pomdp/actions/workflows/build_and_test.yml)

`opof-pomdp` is maintained by the [Kavraki Lab](https://www.kavrakilab.org/) at Rice University.

### Installation
```console
$ pip install opof-pomdp
```

`opof-pomdp` is officially tested and supported for Python 3.9, 3.10, 3.11 and Ubuntu 20.04, 22.04.

## Domain: `POMDPMacro[task,length]`

```python
from opof_pomdp.domains import POMDPMacro
# Creates a POMDPMacro domain instance for the "LightDark" task with macro-action length 8.
domain = POMDPMacro("LightDark", 8) 
```

##### Description
We explore doing online POMDP planning for a specified task using the [DESPOT](https://www.jair.org/index.php/jair/article/view/11043) online POMDP planner. 
The robot operates in a _partially observable_ world, and tracks a belief over the world's state across actions that it has taken. 
Given the current belief at each step, the robot must determine a good action (which corresponds to moving a fixed distance toward some heading) to execute. 
It does so by running the DESPOT online POMDP planner. DESPOT runs some form of anytime Monte-Carlo tree search over possible action and observation sequences, rooted at the current belief, and returns a lower bound for the computed partial policy. 

##### Planner optimization problem
Since the tree search is exponential in search depth, `POMDPMacro[task,length]` explores using open-loop *macro-actions* to improve the planning efficiency. 
Here, DESPOT is parameterized with a set of $8$ *macro-actions*, which are 2D cubic Bezier curves stretched and discretized into $length$ number of line segments that 
determine the heading of each corresponding action in the macro-action. Each Bezier curve is controlled by a *control* vector $\in \mathbb{R}^{2 \times 3}$, which determine the control points of the curve. 
Since the shape of a Bezier curve is invariant up to a fixed constant across the control points, we constrain the control vector to lie on the unit sphere. 
The planner optimization problem is to find a generator $G_\theta(c)$ that maps a problem instance (in this case, the combination of the current belief, represented as a particle filter, and the current task parameters, 
whose representation depends on the task) to a *joint control* vector $\in \mathbb{R}^{8 \times 2 \times 3}$ (which determines the shape of the $8$ macro-actions), such that the lower bound value reported by DESPOT is maximized. 

##### Planning objective
$\boldsymbol{f}(x; c)$ is given as the lower bound value reported by DESPOT, under a timeout of $100$ ms. 
When evaluating, we instead run the planner across $50$ episodes, at each step calling the generator, and compute the average sum of rewards 
(as opposed to considering the lower bound value for a single belief during training).

#### Problem instance distribution
For `POMDPMacro[task,length]`, the distribution of problem instances is _dynamic_. 
It is hard to prescribe a "dataset of beliefs" in online POMDP planning to construct a problem instance distribution. 
The space of reachable beliefs is too hard to determine beforehand, and too small relative to the entire belief space to sample at random. 
Instead, `POMDPMacro[task,length]` loops through episodes of planning and execution, returning the current task parameters and belief at the current step
whenever samples from the problem instance distribution are requested. 

## Tasks

### `LightDark`

<p align="left">
    <img src="https://github.com/opoframework/opof-pomdp/blob/master/docs/_static/img/lightdark_start.png?raw=true" width="250px"/>
</p>

##### Description
The robot (blue circle) wants to move to and stop exactly at a goal location (green cross). 
However, it cannot observe its own position in the dark region (gray background), but can do so only in the light region (white vertical strip). 
It starts with uncertainty over its position (yellow particles) and should discover, through planning, 
that localizing against the light before attempting to stop at the goal will lead to a higher success rate, despite taking a longer route. 

##### Task parameters
The task is parameterized by the goal position and the position of the light strip, which are uniformly selected.

##### Recommended length:
$8$ is the macro-action length known to be empirically optimal when training using GC.

### `PuckPush`

<p align="left">
    <img src="https://github.com/opoframework/opof-pomdp/blob/master/docs/_static/img/puckpush_start.png?raw=true" width="500px"/>
</p>

##### Description
A circular robot (blue) pushes a circular puck (yellow circle) toward a goal (green circle). 
The world has two vertical strips (yellow) which have the same color as the puck, preventing observations of the puck from being made when on top. 
The robot starts with little uncertainty (red particles) over its position and the puck's position corresponding to sensor noise, 
which grows as the puck moves across the vertical strips. Furthermore, since both robot and puck are circular, 
the puck slides across the surface of the robot whenever it is pushed. 
The robot must discover, through planning, an extremely long-horizon plan that can (i) recover localization of the puck, 
and can (ii) recover from the sliding effect by retracing to re-push the puck. 

##### Task parameters
The task is parameterized by position of the goal region, which is uniformly selected within the white area on the right. 

##### Recommended length:
$5$ is the macro-action length known to be empirically optimal when training using GC.


####

## Citing

TBD

## License

`opof-pomdp` is licensed under the [BSD-3 license](https://github.com/opoframework/opof-pomdp/blob/master/LICENSE.md).

`opof-pomdp` includes a copy of the following libraries as dependencies. These copies are protected and distributed according to the corresponding original license.
- [Boost](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/boost) ([homepage](https://github.com/boostorg/boost)): [Boost Software License](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/boost/LICENSE)
- [CARLA/SUMMIT](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/carla) ([homepage](https://github.com/AdaCompNUS/summit)): [MIT](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/carla/LICENSE)
- [DESPOT](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/despot) ([homepage](https://github.com/AdaCompNUS/despot)): [GPLv3](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/despot/LICENSE)
- [tomykira's Base64.h](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/macaron) ([homepage](https://gist.github.com/tomykaira/f0fd86b6c73063283afe550bc5d77594)): [MIT](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/macaron/LICENSE)
- [magic](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/magic) ([homepage](https://github.com/AdaCompNUS/magic)): [MIT](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/magic/LICENSE)
- [OpenCV](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/opencv) ([homepage](https://github.com/opencv/opencv/tree/4.7.0)): [Apache 2.0](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/opencv/LICENSE)
- [GAMMA/RVO2](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/rvo2) ([homepage](https://github.com/AdaCompNUS/GAMMA)): [Apache 2.0](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/rvo2/LICENSE)

`opof-pomdp` is maintained by the [Kavraki Lab](https://www.kavrakilab.org/) at Rice University, funded in part by NSF RI 2008720 and Rice University funds.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/opoframework/pomdp",
    "name": "opof-pomdp",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "",
    "keywords": "opof,optimization,machine learning,reinforcement learning,planning,pomdp,uncertainty,robotics",
    "author": "Yiyuan Lee",
    "author_email": "yiyuan.lee@rice.edu",
    "download_url": "",
    "platform": null,
    "description": "# opof-pomdp\n\n[OPOF](https://github.com/opoframework/opof) online POMDP planning domains for 2D navigation under uncertainty. They include the optimization of macro-actions.\n\n\n\n[![Build and Test](https://github.com/opoframework/opof-pomdp/actions/workflows/build_and_test.yml/badge.svg)](https://github.com/opoframework/opof-pomdp/actions/workflows/build_and_test.yml)\n\n`opof-pomdp` is maintained by the [Kavraki Lab](https://www.kavrakilab.org/) at Rice University.\n\n### Installation\n```console\n$ pip install opof-pomdp\n```\n\n`opof-pomdp` is officially tested and supported for Python 3.9, 3.10, 3.11 and Ubuntu 20.04, 22.04.\n\n## Domain: `POMDPMacro[task,length]`\n\n```python\nfrom opof_pomdp.domains import POMDPMacro\n# Creates a POMDPMacro domain instance for the \"LightDark\" task with macro-action length 8.\ndomain = POMDPMacro(\"LightDark\", 8) \n```\n\n##### Description\nWe explore doing online POMDP planning for a specified task using the [DESPOT](https://www.jair.org/index.php/jair/article/view/11043) online POMDP planner. \nThe robot operates in a _partially observable_ world, and tracks a belief over the world's state across actions that it has taken. \nGiven the current belief at each step, the robot must determine a good action (which corresponds to moving a fixed distance toward some heading) to execute. \nIt does so by running the DESPOT online POMDP planner. DESPOT runs some form of anytime Monte-Carlo tree search over possible action and observation sequences, rooted at the current belief, and returns a lower bound for the computed partial policy. \n\n##### Planner optimization problem\nSince the tree search is exponential in search depth, `POMDPMacro[task,length]` explores using open-loop *macro-actions* to improve the planning efficiency. \nHere, DESPOT is parameterized with a set of $8$ *macro-actions*, which are 2D cubic Bezier curves stretched and discretized into $length$ number of line segments that \ndetermine the heading of each corresponding action in the macro-action. Each Bezier curve is controlled by a *control* vector $\\in \\mathbb{R}^{2 \\times 3}$, which determine the control points of the curve. \nSince the shape of a Bezier curve is invariant up to a fixed constant across the control points, we constrain the control vector to lie on the unit sphere. \nThe planner optimization problem is to find a generator $G_\\theta(c)$ that maps a problem instance (in this case, the combination of the current belief, represented as a particle filter, and the current task parameters, \nwhose representation depends on the task) to a *joint control* vector $\\in \\mathbb{R}^{8 \\times 2 \\times 3}$ (which determines the shape of the $8$ macro-actions), such that the lower bound value reported by DESPOT is maximized. \n\n##### Planning objective\n$\\boldsymbol{f}(x; c)$ is given as the lower bound value reported by DESPOT, under a timeout of $100$ ms. \nWhen evaluating, we instead run the planner across $50$ episodes, at each step calling the generator, and compute the average sum of rewards \n(as opposed to considering the lower bound value for a single belief during training).\n\n#### Problem instance distribution\nFor `POMDPMacro[task,length]`, the distribution of problem instances is _dynamic_. \nIt is hard to prescribe a \"dataset of beliefs\" in online POMDP planning to construct a problem instance distribution. \nThe space of reachable beliefs is too hard to determine beforehand, and too small relative to the entire belief space to sample at random. \nInstead, `POMDPMacro[task,length]` loops through episodes of planning and execution, returning the current task parameters and belief at the current step\nwhenever samples from the problem instance distribution are requested. \n\n## Tasks\n\n### `LightDark`\n\n<p align=\"left\">\n    <img src=\"https://github.com/opoframework/opof-pomdp/blob/master/docs/_static/img/lightdark_start.png?raw=true\" width=\"250px\"/>\n</p>\n\n##### Description\nThe robot (blue circle) wants to move to and stop exactly at a goal location (green cross). \nHowever, it cannot observe its own position in the dark region (gray background), but can do so only in the light region (white vertical strip). \nIt starts with uncertainty over its position (yellow particles) and should discover, through planning, \nthat localizing against the light before attempting to stop at the goal will lead to a higher success rate, despite taking a longer route. \n\n##### Task parameters\nThe task is parameterized by the goal position and the position of the light strip, which are uniformly selected.\n\n##### Recommended length:\n$8$ is the macro-action length known to be empirically optimal when training using GC.\n\n### `PuckPush`\n\n<p align=\"left\">\n    <img src=\"https://github.com/opoframework/opof-pomdp/blob/master/docs/_static/img/puckpush_start.png?raw=true\" width=\"500px\"/>\n</p>\n\n##### Description\nA circular robot (blue) pushes a circular puck (yellow circle) toward a goal (green circle). \nThe world has two vertical strips (yellow) which have the same color as the puck, preventing observations of the puck from being made when on top. \nThe robot starts with little uncertainty (red particles) over its position and the puck's position corresponding to sensor noise, \nwhich grows as the puck moves across the vertical strips. Furthermore, since both robot and puck are circular, \nthe puck slides across the surface of the robot whenever it is pushed. \nThe robot must discover, through planning, an extremely long-horizon plan that can (i) recover localization of the puck, \nand can (ii) recover from the sliding effect by retracing to re-push the puck. \n\n##### Task parameters\nThe task is parameterized by position of the goal region, which is uniformly selected within the white area on the right. \n\n##### Recommended length:\n$5$ is the macro-action length known to be empirically optimal when training using GC.\n\n\n####\n\n## Citing\n\nTBD\n\n## License\n\n`opof-pomdp` is licensed under the [BSD-3 license](https://github.com/opoframework/opof-pomdp/blob/master/LICENSE.md).\n\n`opof-pomdp` includes a copy of the following libraries as dependencies. These copies are protected and distributed according to the corresponding original license.\n- [Boost](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/boost) ([homepage](https://github.com/boostorg/boost)): [Boost Software License](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/boost/LICENSE)\n- [CARLA/SUMMIT](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/carla) ([homepage](https://github.com/AdaCompNUS/summit)): [MIT](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/carla/LICENSE)\n- [DESPOT](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/despot) ([homepage](https://github.com/AdaCompNUS/despot)): [GPLv3](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/despot/LICENSE)\n- [tomykira's Base64.h](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/macaron) ([homepage](https://gist.github.com/tomykaira/f0fd86b6c73063283afe550bc5d77594)): [MIT](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/macaron/LICENSE)\n- [magic](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/magic) ([homepage](https://github.com/AdaCompNUS/magic)): [MIT](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/magic/LICENSE)\n- [OpenCV](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/opencv) ([homepage](https://github.com/opencv/opencv/tree/4.7.0)): [Apache 2.0](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/opencv/LICENSE)\n- [GAMMA/RVO2](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/rvo2) ([homepage](https://github.com/AdaCompNUS/GAMMA)): [Apache 2.0](https://github.com/opoframework/opof-pomdp/tree/master/pomdp_core/rvo2/LICENSE)\n\n`opof-pomdp` is maintained by the [Kavraki Lab](https://www.kavrakilab.org/) at Rice University, funded in part by NSF RI 2008720 and Rice University funds.\n",
    "bugtrack_url": null,
    "license": "BSD-3",
    "summary": "OPOF domains for online POMDP planning",
    "version": "0.2.0",
    "split_keywords": [
        "opof",
        "optimization",
        "machine learning",
        "reinforcement learning",
        "planning",
        "pomdp",
        "uncertainty",
        "robotics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2407c263e8c1c72e6a59422911fa9a66c4515171fbf6385f667d147965881d25",
                "md5": "d32885b6104f6afe9ba3b439bf0e8e8f",
                "sha256": "c466c40188d8d0a161936472529e07703364e70031610e2cd68a93165014f880"
            },
            "downloads": -1,
            "filename": "opof_pomdp-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "d32885b6104f6afe9ba3b439bf0e8e8f",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.9",
            "size": 1548367,
            "upload_time": "2023-03-12T22:13:24",
            "upload_time_iso_8601": "2023-03-12T22:13:24.729982Z",
            "url": "https://files.pythonhosted.org/packages/24/07/c263e8c1c72e6a59422911fa9a66c4515171fbf6385f667d147965881d25/opof_pomdp-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d959f69179ca4273ae3fb06b9cdd627e73297d3519d038d85893d4dc6f5192a5",
                "md5": "b4a87d8d82f9847676c1b867c724b21d",
                "sha256": "ae886f911f45831ef48ae2ab97f4be738cbaa7ee74344d1257182ddab4246f85"
            },
            "downloads": -1,
            "filename": "opof_pomdp-0.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "b4a87d8d82f9847676c1b867c724b21d",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.9",
            "size": 1548367,
            "upload_time": "2023-03-12T22:13:27",
            "upload_time_iso_8601": "2023-03-12T22:13:27.609170Z",
            "url": "https://files.pythonhosted.org/packages/d9/59/f69179ca4273ae3fb06b9cdd627e73297d3519d038d85893d4dc6f5192a5/opof_pomdp-0.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "666cfec637648af987621f80e12bc5d09221df162b1aece45825529be186ef96",
                "md5": "508f8db941295ed2a998ad93784bb78f",
                "sha256": "e0bfb19ab66978d2f464b914065e73ee29d275e473ab51c2c3c3c0b4ec30a3a3"
            },
            "downloads": -1,
            "filename": "opof_pomdp-0.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "508f8db941295ed2a998ad93784bb78f",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.9",
            "size": 1548361,
            "upload_time": "2023-03-12T22:13:30",
            "upload_time_iso_8601": "2023-03-12T22:13:30.007931Z",
            "url": "https://files.pythonhosted.org/packages/66/6c/fec637648af987621f80e12bc5d09221df162b1aece45825529be186ef96/opof_pomdp-0.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-12 22:13:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "opoframework",
    "github_project": "pomdp",
    "lcname": "opof-pomdp"
}
        
Elapsed time: 0.04495s