Name | buffalo-gym JSON |
Version |
0.1.0
JSON |
| download |
home_page | None |
Summary | Buffalo Gym environment |
upload_time | 2024-11-30 03:41:59 |
maintainer | None |
docs_url | None |
author | foreverska |
requires_python | None |
license | None |
keywords |
gymnasium
gym
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Buffalo Gym
A multi-armed bandit (MAB) environment for the gymnasium API.
One-armed Bandit is a reference to slot machines, and Buffalo
is a reference to one such slot machine that I am fond
of. MABs are an excellent playground for theoretical exercise and
debugging of RL agents as they provide an environment that
can be reasoned about easily. It helped me once to step back
and write an MAB to debug my DQN agent. But there was a lack
of native gymnasium environments, so I wrote Buffalo, an easy-to-use
environment that it might help someone else.
## Buffalo ("Buffalo-v0" | "Bandit-v0")
Default multi-armed bandit environment. Arm center values
are drawn from a normal distribution (0, arms). When an
arm is pulled, a random value is drawn from a normal
distribution (0, 1) and added to the chosen arm center
value. This is not intended to be challenging for an agent but
easy for the debugger to reason about.
## Multi-Buffalo ("MultiBuffalo-v0" | "ContextualBandit-v0")
This serves as a contextual bandit implementation. It is a
k-armed bandit with n states. These states are indicated to
the agent in the observation and the two states have different
reward offsets for each arm. The goal of the agent is to
learn and contextualize best action for a given state. This is
a good stepping stone to Markov Decision Processes.
This module had an extra parameter, pace. By default (None), a
new state is chosen for every step of the environment. It can
be set to any integer to determine how many steps between randomly
choosing a new state. Of course, transitioning to a new state is
not guaranteed as the next state is random.
## Buffalo Trail ("BuffaloTrail-v0" | "StatefulBandit-v0")
This serves as a stateful bandit implementation. There is a
pervasive rumor that slot machine manufacturers put in
a secret sequence of bets which trigger a large reward or the
jackpot. It is almost certainly not true in the real world but
it is here. A sequence of actions gives the max reward. The
sequence is randomly chosen on environment setup and indicated
in the info of reset. Not all sequences are aliased and this
may be an important thing to check in an implementation. Therefore,
there is a rudimentary algorithm to force aliasing included.
## Using
Install via pip and import buffalo_gym along with gymnasium.
```
import gymnasium
import buffalo_gym
env = gym.make("Buffalo-v0")
```
Raw data
{
"_id": null,
"home_page": null,
"name": "buffalo-gym",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "gymnasium, gym",
"author": "foreverska",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/c0/d3/fe79259b6f5ecbfaffc79771bc46ce37c2be255b0a63a8b4254ce33fa9ee/buffalo_gym-0.1.0.tar.gz",
"platform": null,
"description": "# Buffalo Gym\n\nA multi-armed bandit (MAB) environment for the gymnasium API.\nOne-armed Bandit is a reference to slot machines, and Buffalo \nis a reference to one such slot machine that I am fond \nof. MABs are an excellent playground for theoretical exercise and \ndebugging of RL agents as they provide an environment that \ncan be reasoned about easily. It helped me once to step back \nand write an MAB to debug my DQN agent. But there was a lack \nof native gymnasium environments, so I wrote Buffalo, an easy-to-use \n environment that it might help someone else.\n\n## Buffalo (\"Buffalo-v0\" | \"Bandit-v0\")\n\nDefault multi-armed bandit environment. Arm center values \nare drawn from a normal distribution (0, arms). When an \narm is pulled, a random value is drawn from a normal \ndistribution (0, 1) and added to the chosen arm center \nvalue. This is not intended to be challenging for an agent but \neasy for the debugger to reason about.\n\n## Multi-Buffalo (\"MultiBuffalo-v0\" | \"ContextualBandit-v0\")\n\nThis serves as a contextual bandit implementation. It is a \nk-armed bandit with n states. These states are indicated to \nthe agent in the observation and the two states have different \nreward offsets for each arm. The goal of the agent is to \nlearn and contextualize best action for a given state. This is \na good stepping stone to Markov Decision Processes.\n\nThis module had an extra parameter, pace. By default (None), a \nnew state is chosen for every step of the environment. It can \nbe set to any integer to determine how many steps between randomly \nchoosing a new state. Of course, transitioning to a new state is \nnot guaranteed as the next state is random.\n\n## Buffalo Trail (\"BuffaloTrail-v0\" | \"StatefulBandit-v0\")\n\nThis serves as a stateful bandit implementation. There is a \npervasive rumor that slot machine manufacturers put in \na secret sequence of bets which trigger a large reward or the \njackpot. It is almost certainly not true in the real world but \nit is here. A sequence of actions gives the max reward. The \nsequence is randomly chosen on environment setup and indicated \nin the info of reset. Not all sequences are aliased and this \nmay be an important thing to check in an implementation. Therefore, \nthere is a rudimentary algorithm to force aliasing included.\n\n## Using\n\nInstall via pip and import buffalo_gym along with gymnasium.\n\n```\nimport gymnasium \nimport buffalo_gym\n\nenv = gym.make(\"Buffalo-v0\")\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "Buffalo Gym environment",
"version": "0.1.0",
"project_urls": {
"Github:": "https://github.com/foreverska/buffalo-gym"
},
"split_keywords": [
"gymnasium",
" gym"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "1713371decbc71b963f6765e93fcfa278d54352488976591179ed804d7b00635",
"md5": "dd4431bc94463cd4e2a98854540a6583",
"sha256": "389d3593d16dfd24145c1d59fbeb02c170b46e4e8a03f4558ac4a4d7040d3c3f"
},
"downloads": -1,
"filename": "buffalo_gym-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "dd4431bc94463cd4e2a98854540a6583",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 8835,
"upload_time": "2024-11-30T03:41:48",
"upload_time_iso_8601": "2024-11-30T03:41:48.457816Z",
"url": "https://files.pythonhosted.org/packages/17/13/371decbc71b963f6765e93fcfa278d54352488976591179ed804d7b00635/buffalo_gym-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c0d3fe79259b6f5ecbfaffc79771bc46ce37c2be255b0a63a8b4254ce33fa9ee",
"md5": "a39b0b474843e08d4e52e9d425824f68",
"sha256": "695973619eb6e382e7be7f9f8609db00ee4c75d177877aa5a9308079c38d8e48"
},
"downloads": -1,
"filename": "buffalo_gym-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "a39b0b474843e08d4e52e9d425824f68",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 7492,
"upload_time": "2024-11-30T03:41:59",
"upload_time_iso_8601": "2024-11-30T03:41:59.130574Z",
"url": "https://files.pythonhosted.org/packages/c0/d3/fe79259b6f5ecbfaffc79771bc46ce37c2be255b0a63a8b4254ce33fa9ee/buffalo_gym-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-30 03:41:59",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "foreverska",
"github_project": "buffalo-gym",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "buffalo-gym"
}