skyline-rl-lab


Nameskyline-rl-lab JSON
Version 0.0.1.2 PyPI version JSON
download
home_pagehttps://github.com/google/skyline_rl_lab
SummaryA package to provide RL capability in tasks.
upload_time2024-08-21 07:32:44
maintainerNone
docs_urlNone
authorJohn Lee
requires_pythonNone
licenseMIT License
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## skyline_rl_lab
We are going to implement and do experiments on RL algorithms in this repo to facilitate our research and tutoring purposes. Below we are going to explain how use this repo with a simple example.

## Environment
For [**RL**](https://developers.google.com/machine-learning/glossary/rl) (Reinforcement learning) to work, we need an environment to interact with. From Skyline lab, We can list supported environment as below:
```python
>>> from skyline import lab
>>> lab.list_env()
===== GridWorld =====
This is a environment to show case of Skyline lab. The environment is a grid world where you can move up, down, right and leftif you don't encounter obstacle. When you obtain the reward (-1, 1, 2), the game is over. You can use env.info() to learn more.
```
Then We use function `make` to create the desired environment. e.g.
```python
>>> grid_env = lab.make(lab.Env.GridWorld)
>>> grid_env.info()
- environment is a grid world
- x means you can't go there
- s means start position
- number means reward at that state
===========
.  .  .  1
.  x  . -1
.  .  .  x
s  x  .  2
===========
```

Avaiable actions are indicated as follows:
```python
>>> grid_env.available_actions()
['U', 'D', 'L', 'R']
```

To get the current state of an environment:
```python
>>> grid_env.current_state
GridState(i=3, j=0)
```

In this specific scenario, the starting position (`s`) is located at coordinates `(3, 0)`.

Let's take an action and check how the state changes in the environment:
```python
>>> grid_env.step('U')  # Take action 'Up'
ActionResult(action='U', state=GridState(i=2, j=0), reward=0, is_done=False, is_truncated=False, info=None)

>>> grid_env.current_state  # Get current state
GridState(i=2, j=0)
```

After taking action `U`, we expect the i-axis to move up from 3->2 and we can confirm this from the return action result. Let's reset the environment by calling the <font color='blue'>reset</font> method which will bring the state of environment back to its initial state `GridState(i=3, j=0)`:
```python
>>> grid_env.reset()
>>> grid_env.current_state
GridState(i=3, j=0)
```

## Experiments of RL algorithms
Here we are going to test some well-known RL algorithms and demonstrate the
usage of this lab. All RL methods we are going to implement must implement proto
<font color='blue'>**RLAlgorithmProto**</font> in
[`rl_protos.py`](skyline/rl_protos.py). We will take a look at the
implementation of some RL methods to see how they are used.

<a id='monte_carlo_method'></a>
### Monte Carlo Method
<b>In this method, we simply simulate many trajectories</b> (<font color='brown'>decision processes</font>)<b>, and calculate the average returns.</b> ([wiki page](https://en.wikiversity.org/wiki/Reinforcement_Learning#Monte_Carlo_policy_evaluation))

We implement this algorithm in [`monte_carlo.py`](skyline/alg/monte_carlo.py). The code snippet below will initialize this RL method:
```python
>>> from skyline.alg import monte_carlo
>>> mc_alg = monte_carlo.MonteCarlo()
```

Each RL method object will support method `fit` to learn from the given
environment object. For example:
```python
>>> mc_alg.fit(grid_env)
```

Then we can leverage utility [`gridworld_utils.py`](skyline/lab/gridworld_utils.py) to print out the learned RL knowledge. Below is the learned [value function](https://en.wikipedia.org/wiki/Reinforcement_learning#Value_function) from the Monte Carlo method:
```python
>>> from skyline.lab import gridworld_utils
>>> gridworld_utils.print_values(mc_alg._state_2_value, grid_env)
---------------------------
 1.18| 1.30| 1.46| 1.00|
---------------------------
 1.31| 0.00| 1.62|-1.00|
---------------------------
 1.46| 1.62| 1.80| 0.00|
---------------------------
 1.31| 0.00| 2.00| 2.00|
```

Then let's check the learned policy:
```python
>>> gridworld_utils.print_policy(mc_alg._policy, grid_env)
---------------------------
  D  |  R  |  D  |  ?  |
---------------------------
  D  |  x  |  D  |  ?  |
---------------------------
  R  |  R  |  D  |  x  |
---------------------------
  U  |  x  |  R  |  ?  |
```

Finally, we can use trained Monte Carlo method object to interact with the
environment. Below is the sample code for reference:
```python
# Play game util done
grid_env.reset()

print(f'Begin state={grid_env.current_state}')
step_count = 0
while not grid_env.is_done:
    result = mc_alg.play(grid_env)
    step_count += 1
    print(result)

print(f'Final reward={result.reward} with {step_count} step(s)')
```

The execution would look like:
```shell
Begin state=GridState(i=3, j=0)
ActionResult(action='U', state=GridState(i=2, j=0), reward=0, is_done=False, is_truncated=False, info=None)
ActionResult(action='R', state=GridState(i=2, j=1), reward=0, is_done=False, is_truncated=False, info=None)
ActionResult(action='R', state=GridState(i=2, j=2), reward=0, is_done=False, is_truncated=False, info=None)
ActionResult(action='D', state=GridState(i=3, j=2), reward=0, is_done=False, is_truncated=False, info=None)
ActionResult(action='R', state=GridState(i=3, j=3), reward=2, is_done=True, is_truncated=False, info=None)
Final reward=2 with 5 step(s)
```

### Random Method
This method takes random action(s) in the given environment. It is often used as a baseline to evaluate other RL methods. The code below will instantiate a Random RL method:

```python
from skyline.alg import random_rl

random_alg = random_rl.RandomRL()
```

Random RL method won't require training at all. So if you call method `fit` of
`random_alg`, it will return immediately:

```
# Training
random_alg.fit(grid_env)
```

Since this is a random process, each time you play the game will have different result very likely:
```python
# Play game util done
grid_env.reset()

print(f'Begin state={grid_env.current_state}')
step_count = 0
while not grid_env.is_done:
    result = random_alg.play(grid_env)
    step_count += 1
    print(result)
print(f'Final reward={result.reward} with {step_count} step(s)')
```

Below is one execution example:
```
Begin state=GridState(i=3, j=0)
ActionResult(action='U', state=GridState(i=2, j=0), reward=0, is_done=False, is_truncated=False, info=None)
...
ActionResult(action='R', state=GridState(i=0, j=3), reward=1, is_done=True, is_truncated=False, info=None)
Final reward=1 with 16 step(s)
```

From the result above, the random RL method took more steps and not guarantee to obtain the best reward, Therefore, it is obvious that the [**Monte Carlo method**](#monte_carlo_method) performs much better than the Random RL method!

## How to rank RL methods
Before we start introducing how score board work, we need to understand
[**RLExaminer**](#rlexaminer) first. Basically, scoreboard is a design to help you rank the
different RL methods.

<a id='rlexaminer'></a>
### RLExaminer
Every environment can have more than one examiner to calculate the score of RL method. Each examiner may have its own aspect to evaluate the RL method (time, reward etc.). Let's check one used to calculate the average reward of grid environment:

```python
from skyline.lab import gridworld_env

# This examiner considers both reward and number of steps.
examiner = gridworld_env.GridWorldExaminer()
```

Then, what's score of Monte Carlo Method:
```python
# Monte Carlo will get reward 2 by taking 5 steps.
# So the score will be reward / steps: 2 / 5 = 0.4
examiner.score(mc_alg, grid_env)
```

[Monte Carlo method](#monte_carlo_method) got score 0.4. Let's check another RL method Random Method:

```python
# The number of steps required by random RL method is unknown.
# Also the best reward is not guaranteed. So the score here will be random.
examiner.score(random_alg, grid_env)
```

Random RL method often got scores to be less than Monte Carlo method.

### Scoreboard
Scoreboard literally calculate the scores of given RL methods according to the specific examiner and the rank those RL methods accordingly:

```python
from skyline import lab

score_board = lab.Scoreboard()
sorted_scores  = score_board.rank(
    examiner=examiner, env=grid_env, rl_methods=[random_alg, mc_alg])
```

Below output will be produced:
```
+-------+------------+---------------------+
| Rank. |  RL Name   |        Score        |
+-------+------------+---------------------+
|   1   | MonteCarlo |         0.4         |
|   2   |  RandomRL  | 0.13333333333333333 |
+-------+------------+---------------------+
```

## Resources
* [Machine Learning Glossary: Reinforcement Learning](https://developers.google.com/machine-learning/glossary/rl)
* [Tensorflow - Introduction to RL and Deep Q Networks](https://www.tensorflow.org/agents/tutorials/0_intro_rl)
* [Udemy - Artificial Intelligence: Reinforcement Learning in Python](https://www.udemy.com/course/artificial-intelligence-reinforcement-learning-in-python/)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/google/skyline_rl_lab",
    "name": "skyline-rl-lab",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "John Lee",
    "author_email": "puremonkey2007@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/8d/46/c79db32a890b830c8cf1a521d40147db47238a323db277450a00876dd54e/skyline_rl_lab-0.0.1.2.tar.gz",
    "platform": null,
    "description": "## skyline_rl_lab\nWe are going to implement and do experiments on RL algorithms in this repo to facilitate our research and tutoring purposes. Below we are going to explain how use this repo with a simple example.\n\n## Environment\nFor [**RL**](https://developers.google.com/machine-learning/glossary/rl) (Reinforcement learning) to work, we need an environment to interact with. From Skyline lab, We can list supported environment as below:\n```python\n>>> from skyline import lab\n>>> lab.list_env()\n===== GridWorld =====\nThis is a environment to show case of Skyline lab. The environment is a grid world where you can move up, down, right and leftif you don't encounter obstacle. When you obtain the reward (-1, 1, 2), the game is over. You can use env.info() to learn more.\n```\nThen We use function `make` to create the desired environment. e.g.\n```python\n>>> grid_env = lab.make(lab.Env.GridWorld)\n>>> grid_env.info()\n- environment is a grid world\n- x means you can't go there\n- s means start position\n- number means reward at that state\n===========\n.  .  .  1\n.  x  . -1\n.  .  .  x\ns  x  .  2\n===========\n```\n\nAvaiable actions are indicated as follows:\n```python\n>>> grid_env.available_actions()\n['U', 'D', 'L', 'R']\n```\n\nTo get the current state of an environment:\n```python\n>>> grid_env.current_state\nGridState(i=3, j=0)\n```\n\nIn this specific scenario, the starting position (`s`) is located at coordinates `(3, 0)`.\n\nLet's take an action and check how the state changes in the environment:\n```python\n>>> grid_env.step('U')  # Take action 'Up'\nActionResult(action='U', state=GridState(i=2, j=0), reward=0, is_done=False, is_truncated=False, info=None)\n\n>>> grid_env.current_state  # Get current state\nGridState(i=2, j=0)\n```\n\nAfter taking action `U`, we expect the i-axis to move up from 3->2 and we can confirm this from the return action result. Let's reset the environment by calling the <font color='blue'>reset</font> method which will bring the state of environment back to its initial state `GridState(i=3, j=0)`:\n```python\n>>> grid_env.reset()\n>>> grid_env.current_state\nGridState(i=3, j=0)\n```\n\n## Experiments of RL algorithms\nHere we are going to test some well-known RL algorithms and demonstrate the\nusage of this lab. All RL methods we are going to implement must implement proto\n<font color='blue'>**RLAlgorithmProto**</font> in\n[`rl_protos.py`](skyline/rl_protos.py). We will take a look at the\nimplementation of some RL methods to see how they are used.\n\n<a id='monte_carlo_method'></a>\n### Monte Carlo Method\n<b>In this method, we simply simulate many trajectories</b> (<font color='brown'>decision processes</font>)<b>, and calculate the average returns.</b> ([wiki page](https://en.wikiversity.org/wiki/Reinforcement_Learning#Monte_Carlo_policy_evaluation))\n\nWe implement this algorithm in [`monte_carlo.py`](skyline/alg/monte_carlo.py). The code snippet below will initialize this RL method:\n```python\n>>> from skyline.alg import monte_carlo\n>>> mc_alg = monte_carlo.MonteCarlo()\n```\n\nEach RL method object will support method `fit` to learn from the given\nenvironment object. For example:\n```python\n>>> mc_alg.fit(grid_env)\n```\n\nThen we can leverage utility [`gridworld_utils.py`](skyline/lab/gridworld_utils.py) to print out the learned RL knowledge. Below is the learned [value function](https://en.wikipedia.org/wiki/Reinforcement_learning#Value_function) from the Monte Carlo method:\n```python\n>>> from skyline.lab import gridworld_utils\n>>> gridworld_utils.print_values(mc_alg._state_2_value, grid_env)\n---------------------------\n 1.18| 1.30| 1.46| 1.00|\n---------------------------\n 1.31| 0.00| 1.62|-1.00|\n---------------------------\n 1.46| 1.62| 1.80| 0.00|\n---------------------------\n 1.31| 0.00| 2.00| 2.00|\n```\n\nThen let's check the learned policy:\n```python\n>>> gridworld_utils.print_policy(mc_alg._policy, grid_env)\n---------------------------\n  D  |  R  |  D  |  ?  |\n---------------------------\n  D  |  x  |  D  |  ?  |\n---------------------------\n  R  |  R  |  D  |  x  |\n---------------------------\n  U  |  x  |  R  |  ?  |\n```\n\nFinally, we can use trained Monte Carlo method object to interact with the\nenvironment. Below is the sample code for reference:\n```python\n# Play game util done\ngrid_env.reset()\n\nprint(f'Begin state={grid_env.current_state}')\nstep_count = 0\nwhile not grid_env.is_done:\n    result = mc_alg.play(grid_env)\n    step_count += 1\n    print(result)\n\nprint(f'Final reward={result.reward} with {step_count} step(s)')\n```\n\nThe execution would look like:\n```shell\nBegin state=GridState(i=3, j=0)\nActionResult(action='U', state=GridState(i=2, j=0), reward=0, is_done=False, is_truncated=False, info=None)\nActionResult(action='R', state=GridState(i=2, j=1), reward=0, is_done=False, is_truncated=False, info=None)\nActionResult(action='R', state=GridState(i=2, j=2), reward=0, is_done=False, is_truncated=False, info=None)\nActionResult(action='D', state=GridState(i=3, j=2), reward=0, is_done=False, is_truncated=False, info=None)\nActionResult(action='R', state=GridState(i=3, j=3), reward=2, is_done=True, is_truncated=False, info=None)\nFinal reward=2 with 5 step(s)\n```\n\n### Random Method\nThis method takes random action(s) in the given environment. It is often used as a baseline to evaluate other RL methods. The code below will instantiate a Random RL method:\n\n```python\nfrom skyline.alg import random_rl\n\nrandom_alg = random_rl.RandomRL()\n```\n\nRandom RL method won't require training at all. So if you call method `fit` of\n`random_alg`, it will return immediately:\n\n```\n# Training\nrandom_alg.fit(grid_env)\n```\n\nSince this is a random process, each time you play the game will have different result very likely:\n```python\n# Play game util done\ngrid_env.reset()\n\nprint(f'Begin state={grid_env.current_state}')\nstep_count = 0\nwhile not grid_env.is_done:\n    result = random_alg.play(grid_env)\n    step_count += 1\n    print(result)\nprint(f'Final reward={result.reward} with {step_count} step(s)')\n```\n\nBelow is one execution example:\n```\nBegin state=GridState(i=3, j=0)\nActionResult(action='U', state=GridState(i=2, j=0), reward=0, is_done=False, is_truncated=False, info=None)\n...\nActionResult(action='R', state=GridState(i=0, j=3), reward=1, is_done=True, is_truncated=False, info=None)\nFinal reward=1 with 16 step(s)\n```\n\nFrom the result above, the random RL method took more steps and not guarantee to obtain the best reward, Therefore, it is obvious that the [**Monte Carlo method**](#monte_carlo_method) performs much better than the Random RL method!\n\n## How to rank RL methods\nBefore we start introducing how score board work, we need to understand\n[**RLExaminer**](#rlexaminer) first. Basically, scoreboard is a design to help you rank the\ndifferent RL methods.\n\n<a id='rlexaminer'></a>\n### RLExaminer\nEvery environment can have more than one examiner to calculate the score of RL method. Each examiner may have its own aspect to evaluate the RL method (time, reward etc.). Let's check one used to calculate the average reward of grid environment:\n\n```python\nfrom skyline.lab import gridworld_env\n\n# This examiner considers both reward and number of steps.\nexaminer = gridworld_env.GridWorldExaminer()\n```\n\nThen, what's score of Monte Carlo Method:\n```python\n# Monte Carlo will get reward 2 by taking 5 steps.\n# So the score will be reward / steps: 2 / 5 = 0.4\nexaminer.score(mc_alg, grid_env)\n```\n\n[Monte Carlo method](#monte_carlo_method) got score 0.4. Let's check another RL method Random Method:\n\n```python\n# The number of steps required by random RL method is unknown.\n# Also the best reward is not guaranteed. So the score here will be random.\nexaminer.score(random_alg, grid_env)\n```\n\nRandom RL method often got scores to be less than Monte Carlo method.\n\n### Scoreboard\nScoreboard literally calculate the scores of given RL methods according to the specific examiner and the rank those RL methods accordingly:\n\n```python\nfrom skyline import lab\n\nscore_board = lab.Scoreboard()\nsorted_scores  = score_board.rank(\n    examiner=examiner, env=grid_env, rl_methods=[random_alg, mc_alg])\n```\n\nBelow output will be produced:\n```\n+-------+------------+---------------------+\n| Rank. |  RL Name   |        Score        |\n+-------+------------+---------------------+\n|   1   | MonteCarlo |         0.4         |\n|   2   |  RandomRL  | 0.13333333333333333 |\n+-------+------------+---------------------+\n```\n\n## Resources\n* [Machine Learning Glossary: Reinforcement Learning](https://developers.google.com/machine-learning/glossary/rl)\n* [Tensorflow - Introduction to RL and Deep Q Networks](https://www.tensorflow.org/agents/tutorials/0_intro_rl)\n* [Udemy - Artificial Intelligence: Reinforcement Learning in Python](https://www.udemy.com/course/artificial-intelligence-reinforcement-learning-in-python/)\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "A package to provide RL capability in tasks.",
    "version": "0.0.1.2",
    "project_urls": {
        "Homepage": "https://github.com/google/skyline_rl_lab"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "03c5534a2563f521f7a0910ec10105f07b363f0318c28d5c2597cf14905c6904",
                "md5": "e8811aef9b4b84768c60d746117bfab7",
                "sha256": "d0fb52bf91bbcfee05b450fb7115cde748edb69e213214239b60f85dcc6d8953"
            },
            "downloads": -1,
            "filename": "skyline_rl_lab-0.0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e8811aef9b4b84768c60d746117bfab7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 29312,
            "upload_time": "2024-08-21T07:32:43",
            "upload_time_iso_8601": "2024-08-21T07:32:43.150080Z",
            "url": "https://files.pythonhosted.org/packages/03/c5/534a2563f521f7a0910ec10105f07b363f0318c28d5c2597cf14905c6904/skyline_rl_lab-0.0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8d46c79db32a890b830c8cf1a521d40147db47238a323db277450a00876dd54e",
                "md5": "2e484cc0cb93ad379d67911492dda430",
                "sha256": "6db996ca60315daa6394a07bdaa6b90f94eb96d2f866d29e795b15529997211b"
            },
            "downloads": -1,
            "filename": "skyline_rl_lab-0.0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "2e484cc0cb93ad379d67911492dda430",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 25179,
            "upload_time": "2024-08-21T07:32:44",
            "upload_time_iso_8601": "2024-08-21T07:32:44.579773Z",
            "url": "https://files.pythonhosted.org/packages/8d/46/c79db32a890b830c8cf1a521d40147db47238a323db277450a00876dd54e/skyline_rl_lab-0.0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-21 07:32:44",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "google",
    "github_project": "skyline_rl_lab",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "skyline-rl-lab"
}
        
Elapsed time: 0.31274s