lcrl

Name	lcrl JSON
Version	0.0.9.2 JSON
	download
home_page	https://github.com/grockious/lcrl
Summary	Logically-Constrained Reinforcement Learning
upload_time	2024-07-22 22:29:32
maintainer	None
docs_url	None
author	Hosein Hasanbeig
requires_python	>=3.8
license	The MIT License Copyright (c) 2024, Hosein Hasanbeig, University of Oxford All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords	rl logic environment agent
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <p align="center">
    <img width="250" src="https://raw.githubusercontent.com/grockious/lcrl/master/assets/lcrl.png">
</p>
<!--- https://i.imgur.com/6Rf2GcE.png --->

![PyPI - License](https://img.shields.io/pypi/l/lcrl)
![PyPI - Version](https://img.shields.io/pypi/v/lcrl)

# LCRL
Logically-Constrained Reinforcement Learning (LCRL) is a model-free reinforcement learning framework to synthesise
policies for unknown, continuous-state-action Markov Decision Processes (MDPs) under a given Linear Temporal Logic
(LTL) property. LCRL automatically shapes a synchronous reward function on-the-fly. This enables any
off-the-shelf RL algorithm to synthesise policies that yield traces which probabilistically satisfy the LTL property. LCRL produces policies that are certified to satisfy the given LTL property with maximum probability.

## Publications
LCRL Tool Paper:
* Hasanbeig, H., Kroening, D., Abate, A., "LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement Learning", QEST, 2022. [[PDF]](https://arxiv.org/pdf/2209.10341.pdf)

LCRL Foundations:
* Mitta, R., Hasanbeig, H., Wang, J., Kroening, D., Kantaros, Y., Abate, A., "Safeguarded Progress in Reinforcement Learning: Safe Bayesian Exploration for Control Policy Synthesis", AAAI Special Track on Safe, Robust and Responsible AI, 2024. [[PDF]](https://arxiv.org/pdf/2312.11314.pdf)
* Hasanbeig, H. , Abate, A. and Kroening, D., "Cautious Reinforcement Learning with Logical Constraints", International Conference on Autonomous Agents and Multi-agent Systems, 2020. [[PDF]](http://ifaamas.org/Proceedings/aamas2020/pdfs/p483.pdf)
* Hasanbeig, H. , Kroening, D. and Abate, A., "Deep Reinforcement Learning with Temporal Logics", International Conference on Formal Modeling and Analysis of Timed Systems, 2020. [[PDF]](https://link.springer.com/content/pdf/10.1007%2F978-3-030-57628-8_1.pdf)
* Hasanbeig, H. , Kroening, D. and Abate, A., "Towards Verifiable and Safe Model-Free Reinforcement Learning", Workshop on Artificial Intelligence and Formal Verification, Logics, Automata and Synthesis (OVERLAY), 2020. [[PDF]](http://ceur-ws.org/Vol-2509/invited.pdf)
* Hasanbeig, H. , Kantaros, Y., Abate, A., Kroening, D., Pappas, G. J., and Lee, I., "Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees", IEEE Conference on Decision and Control, 2019. [[PDF]](https://arxiv.org/pdf/1909.05304.pdf)
* Hasanbeig, H. , Abate, A. and Kroening, D., "Logically-Constrained Neural Fitted Q-Iteration", International Conference on Autonomous Agents and Multi-agent Systems, 2019. [[PDF]](https://arxiv.org/pdf/1809.07823.pdf)
* Lim Zun Yuan, Hasanbeig, H. , Abate, A. and Kroening, D., "Modular Deep Reinforcement Learning with Temporal Logic Specifications", CoRR abs/1909.11591, 2019. [[PDF]](https://arxiv.org/pdf/1909.11591.pdf)
* Hasanbeig, H. , Abate, A. and Kroening, D., "Certified Reinforcement Learning with Logic Guidance", CoRR abs/1902.00778, 2019. [[PDF]](https://arxiv.org/pdf/1902.00778.pdf)
* Hasanbeig, H. , Abate, A. and Kroening, D., "Logically-Constrained Reinforcement Learning", CoRR abs/1801.08099, 2018. [[PDF]](https://arxiv.org/pdf/1801.08099.pdf)

## Installation
You can install LCRL using 
```
pip3 install lcrl
```

Alternatively, you can clone this repository and install the dependencies:
```
git clone https://github.com/grockious/lcrl.git
cd lcrl
pip3 install .
```
or
```
pip3 install git+https://github.com/grockious/lcrl.git
```

## Usage
#### Training an RL agent under an LTL property

Sample training commands can be found under the `./scripts` directory. LCRL consists of three main classes `MDP`, the `LDBA` automaton and the core `train`ing algorithm. Inside LCRL, the `MDP` state and the `LDBA` state are automatically synchronised, resulting in an on-the-fly product MDP structure.

&nbsp;
<p align="center">
    <img width="650" src="https://raw.githubusercontent.com/grockious/lcrl/master/assets/lcrl_overview.png">
</p>
<!--- https://i.imgur.com/uH481P0.png --->
&nbsp;

Over the product MDP, LCRL shapes a reward function based on the `LDBA` object. An optimal stationary Markov policy synthesised by LCRL on the product
MDP is guaranteed to induce a finite-memory policy on the original MDP that maximises the probability of satisfying the given LTL property. 

The package includes a number of pre-built `MDP` and `LDBA` class objects. A set of instances of the `MDP` and `LDBA` classes
are available in `lcrl.environments` and `lcrl.automata`, respectively. As an example, to train an agent for `minecraft-t1` (Table 2 in [the tool paper](https://arxiv.org/pdf/2209.10341.pdf)) run:

```
python3
```

```python
>>>  # import LCRL code trainer module
>>> from lcrl.train import train
>>>  # import the pre-built LDBA for minecraft-t1
>>> from lcrl.automata.minecraft_1 import minecraft_1
>>>  # import the pre-built MDP for minecraft-t1
>>> from lcrl.environments.minecraft import minecraft
>>>
>>> LDBA = minecraft_1
>>> MDP = minecraft
>>>
>>>  # train the agent
>>> task = train(MDP, LDBA,
                  algorithm='ql',
                  episode_num=500,
                  iteration_num_max=4000,
                  discount_factor=0.95,
                  learning_rate=0.9
                  )
```

## Applying LCRL to a custom black-box MDP and a custom LTL property
#### - MDP:
LCRL can be connected to a black-box MDP object that is fully unknown to
the tool. This includes the size of the state space as LCRL automatically keeps track of the visited states. Following the OpenAI's convention, the MDP object, call it `MDP`, should at
least have the following methods:
```
MDP.reset()
```
to reset the MDP state,
```
MDP.step(action)
```
to change the state of the MDP upon executing `action`,
```
MDP.state_label(state)
```
to output the label of `state`.

#### - LTL:
The LTL property has to be converted to an LDBA, which is a finite-state machine.
An excellent tool for this is OWL, which you can [try online](https://owl.model.in.tum.de/try/).
The synthesised LDBA can be used as an object of the class `lcrl.automata.ldba`.  

The constructed LDBA, call it `LDBA`, is expected to offer the following methods:
```
LDBA.reset()
```
to reset the automaton state and its accepting frontier function,
```
LDBA.step(label)
```
to change the state of the automaton upon reading `label`,
```
LDBA.accepting_frontier_function(state)
```
to update the accepting frontier set. This method is already included in the class `lcrl.automata.ldba`, thus for a custom `LDBA` object you only need to instance this class and specify the `reset()` and `step(label)` methods.  

## Reference
Please cite our tool paper and this repository if you use LCRL in your publications:

```
@inproceedings{lcrl_tool_paper,
title={{LCRL}: Certified Policy Synthesis via Logically-Constrained Reinforcement Learning},
author={Hasanbeig, Hosein and Kroening, Daniel and Abate, Alessandro},
booktitle={International Conference on Quantitative Evaluation of SysTems},
year={2022},
organization={Springer}
}
```
```
@misc{lcrl_repo,
  title={Logically-Constrained Reinforcement Learning Code Repository},
  author={Hasanbeig, Hosein and Kroening, Daniel and Abate, Alessandro},
  year={2022}
}
```

## License
This project is licensed under the terms of the [MIT License](/LICENSE)

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/grockious/lcrl",
    "name": "lcrl",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "rl, logic, environment, agent",
    "author": "Hosein Hasanbeig",
    "author_email": "Hosein Hasanbeig <hosein.hasanbeig@icloud.com>",
    "download_url": "https://files.pythonhosted.org/packages/ae/11/0c8a79b0b7d5768f3cb3cef7cc2a834793ebdc0ae700987990bc46868d10/lcrl-0.0.9.2.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n    <img width=\"250\" src=\"https://raw.githubusercontent.com/grockious/lcrl/master/assets/lcrl.png\">\n</p>\n<!--- https://i.imgur.com/6Rf2GcE.png --->\n\n![PyPI - License](https://img.shields.io/pypi/l/lcrl)\n![PyPI - Version](https://img.shields.io/pypi/v/lcrl)\n\n# LCRL\nLogically-Constrained Reinforcement Learning (LCRL) is a model-free reinforcement learning framework to synthesise\npolicies for unknown, continuous-state-action Markov Decision Processes (MDPs) under a given Linear Temporal Logic\n(LTL) property. LCRL automatically shapes a synchronous reward function on-the-fly. This enables any\noff-the-shelf RL algorithm to synthesise policies that yield traces which probabilistically satisfy the LTL property. LCRL produces policies that are certified to satisfy the given LTL property with maximum probability.\n\n## Publications\nLCRL Tool Paper:\n* Hasanbeig, H., Kroening, D., Abate, A., \"LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement Learning\", QEST, 2022. [[PDF]](https://arxiv.org/pdf/2209.10341.pdf)\n\nLCRL Foundations:\n* Mitta, R., Hasanbeig, H., Wang, J., Kroening, D., Kantaros, Y., Abate, A., \"Safeguarded Progress in Reinforcement Learning: Safe Bayesian Exploration for Control Policy Synthesis\", AAAI Special Track on Safe, Robust and Responsible AI, 2024. [[PDF]](https://arxiv.org/pdf/2312.11314.pdf)\n* Hasanbeig, H. , Abate, A. and Kroening, D., \"Cautious Reinforcement Learning with Logical Constraints\", International Conference on Autonomous Agents and Multi-agent Systems, 2020. [[PDF]](http://ifaamas.org/Proceedings/aamas2020/pdfs/p483.pdf)\n* Hasanbeig, H. , Kroening, D. and Abate, A., \"Deep Reinforcement Learning with Temporal Logics\", International Conference on Formal Modeling and Analysis of Timed Systems, 2020. [[PDF]](https://link.springer.com/content/pdf/10.1007%2F978-3-030-57628-8_1.pdf)\n* Hasanbeig, H. , Kroening, D. and Abate, A., \"Towards Verifiable and Safe Model-Free Reinforcement Learning\", Workshop on Artificial Intelligence and Formal Verification, Logics, Automata and Synthesis (OVERLAY), 2020. [[PDF]](http://ceur-ws.org/Vol-2509/invited.pdf)\n* Hasanbeig, H. , Kantaros, Y., Abate, A., Kroening, D., Pappas, G. J., and Lee, I., \"Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees\", IEEE Conference on Decision and Control, 2019. [[PDF]](https://arxiv.org/pdf/1909.05304.pdf)\n* Hasanbeig, H. , Abate, A. and Kroening, D., \"Logically-Constrained Neural Fitted Q-Iteration\", International Conference on Autonomous Agents and Multi-agent Systems, 2019. [[PDF]](https://arxiv.org/pdf/1809.07823.pdf)\n* Lim Zun Yuan, Hasanbeig, H. , Abate, A. and Kroening, D., \"Modular Deep Reinforcement Learning with Temporal Logic Specifications\", CoRR abs/1909.11591, 2019. [[PDF]](https://arxiv.org/pdf/1909.11591.pdf)\n* Hasanbeig, H. , Abate, A. and Kroening, D., \"Certified Reinforcement Learning with Logic Guidance\", CoRR abs/1902.00778, 2019. [[PDF]](https://arxiv.org/pdf/1902.00778.pdf)\n* Hasanbeig, H. , Abate, A. and Kroening, D., \"Logically-Constrained Reinforcement Learning\", CoRR abs/1801.08099, 2018. [[PDF]](https://arxiv.org/pdf/1801.08099.pdf)\n\n## Installation\nYou can install LCRL using \n```\npip3 install lcrl\n```\n\nAlternatively, you can clone this repository and install the dependencies:\n```\ngit clone https://github.com/grockious/lcrl.git\ncd lcrl\npip3 install .\n```\nor\n```\npip3 install git+https://github.com/grockious/lcrl.git\n```\n\n## Usage\n#### Training an RL agent under an LTL property\n\nSample training commands can be found under the `./scripts` directory. LCRL consists of three main classes `MDP`, the `LDBA` automaton and the core `train`ing algorithm. Inside LCRL, the `MDP` state and the `LDBA` state are automatically synchronised, resulting in an on-the-fly product MDP structure.\n\n&nbsp;\n<p align=\"center\">\n    <img width=\"650\" src=\"https://raw.githubusercontent.com/grockious/lcrl/master/assets/lcrl_overview.png\">\n</p>\n<!--- https://i.imgur.com/uH481P0.png --->\n&nbsp;\n\nOver the product MDP, LCRL shapes a reward function based on the `LDBA` object. An optimal stationary Markov policy synthesised by LCRL on the product\nMDP is guaranteed to induce a finite-memory policy on the original MDP that maximises the probability of satisfying the given LTL property. \n\nThe package includes a number of pre-built `MDP` and `LDBA` class objects. A set of instances of the `MDP` and `LDBA` classes\nare available in `lcrl.environments` and `lcrl.automata`, respectively. As an example, to train an agent for `minecraft-t1` (Table 2 in [the tool paper](https://arxiv.org/pdf/2209.10341.pdf)) run:\n\n```\npython3\n```\n\n```python\n>>>  # import LCRL code trainer module\n>>> from lcrl.train import train\n>>>  # import the pre-built LDBA for minecraft-t1\n>>> from lcrl.automata.minecraft_1 import minecraft_1\n>>>  # import the pre-built MDP for minecraft-t1\n>>> from lcrl.environments.minecraft import minecraft\n>>>\n>>> LDBA = minecraft_1\n>>> MDP = minecraft\n>>>\n>>>  # train the agent\n>>> task = train(MDP, LDBA,\n                  algorithm='ql',\n                  episode_num=500,\n                  iteration_num_max=4000,\n                  discount_factor=0.95,\n                  learning_rate=0.9\n                  )\n```\n\n## Applying LCRL to a custom black-box MDP and a custom LTL property\n#### - MDP:\nLCRL can be connected to a black-box MDP object that is fully unknown to\nthe tool. This includes the size of the state space as LCRL automatically keeps track of the visited states. Following the OpenAI's convention, the MDP object, call it `MDP`, should at\nleast have the following methods:\n```\nMDP.reset()\n```\nto reset the MDP state,\n```\nMDP.step(action)\n```\nto change the state of the MDP upon executing `action`,\n```\nMDP.state_label(state)\n```\nto output the label of `state`.\n\n#### - LTL:\nThe LTL property has to be converted to an LDBA, which is a finite-state machine.\nAn excellent tool for this is OWL, which you can [try online](https://owl.model.in.tum.de/try/).\nThe synthesised LDBA can be used as an object of the class `lcrl.automata.ldba`.  \n\nThe constructed LDBA, call it `LDBA`, is expected to offer the following methods:\n```\nLDBA.reset()\n```\nto reset the automaton state and its accepting frontier function,\n```\nLDBA.step(label)\n```\nto change the state of the automaton upon reading `label`,\n```\nLDBA.accepting_frontier_function(state)\n```\nto update the accepting frontier set. This method is already included in the class `lcrl.automata.ldba`, thus for a custom `LDBA` object you only need to instance this class and specify the `reset()` and `step(label)` methods.  \n\n## Reference\nPlease cite our tool paper and this repository if you use LCRL in your publications:\n\n```\n@inproceedings{lcrl_tool_paper,\ntitle={{LCRL}: Certified Policy Synthesis via Logically-Constrained Reinforcement Learning},\nauthor={Hasanbeig, Hosein and Kroening, Daniel and Abate, Alessandro},\nbooktitle={International Conference on Quantitative Evaluation of SysTems},\nyear={2022},\norganization={Springer}\n}\n```\n```\n@misc{lcrl_repo,\n  title={Logically-Constrained Reinforcement Learning Code Repository},\n  author={Hasanbeig, Hosein and Kroening, Daniel and Abate, Alessandro},\n  year={2022}\n}\n```\n\n## License\nThis project is licensed under the terms of the [MIT License](/LICENSE)\n",
    "bugtrack_url": null,
    "license": "The MIT License  Copyright (c) 2024, Hosein Hasanbeig, University of Oxford All rights reserved.  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "Logically-Constrained Reinforcement Learning",
    "version": "0.0.9.2",
    "project_urls": {
        "Homepage": "https://github.com/grockious/lcrl"
    },
    "split_keywords": [
        "rl",
        " logic",
        " environment",
        " agent"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5d82c495d18e6f5a339b00be506a19486cecc815ca1ea591adfa9a7375038490",
                "md5": "419968bed03ddb4e4d482ba0f583abe2",
                "sha256": "c66fbcc35029e08d37542b2d37055fc2b21843956519f67655e3fa5803c034ed"
            },
            "downloads": -1,
            "filename": "lcrl-0.0.9.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "419968bed03ddb4e4d482ba0f583abe2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 57032,
            "upload_time": "2024-07-22T22:29:31",
            "upload_time_iso_8601": "2024-07-22T22:29:31.072497Z",
            "url": "https://files.pythonhosted.org/packages/5d/82/c495d18e6f5a339b00be506a19486cecc815ca1ea591adfa9a7375038490/lcrl-0.0.9.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ae110c8a79b0b7d5768f3cb3cef7cc2a834793ebdc0ae700987990bc46868d10",
                "md5": "cf433d5d2dcdef017a008c8d31230310",
                "sha256": "51837eb380c465806bbb70d0a4444e3a662e72c8155136f4c5afd3a59d57c211"
            },
            "downloads": -1,
            "filename": "lcrl-0.0.9.2.tar.gz",
            "has_sig": false,
            "md5_digest": "cf433d5d2dcdef017a008c8d31230310",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 301519,
            "upload_time": "2024-07-22T22:29:32",
            "upload_time_iso_8601": "2024-07-22T22:29:32.612070Z",
            "url": "https://files.pythonhosted.org/packages/ae/11/0c8a79b0b7d5768f3cb3cef7cc2a834793ebdc0ae700987990bc46868d10/lcrl-0.0.9.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-22 22:29:32",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "grockious",
    "github_project": "lcrl",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "lcrl"
}

Hosein Hasanbeig