buffalo-gym


Namebuffalo-gym JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryBuffalo Gym environment
upload_time2024-11-30 03:41:59
maintainerNone
docs_urlNone
authorforeverska
requires_pythonNone
licenseNone
keywords gymnasium gym
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Buffalo Gym

A multi-armed bandit (MAB) environment for the gymnasium API.
One-armed Bandit is a reference to slot machines, and Buffalo 
is a reference to one such slot machine that I am fond 
of.  MABs are an excellent playground for theoretical exercise and 
debugging of RL agents as they provide an environment that 
can be reasoned about easily.  It helped me once to step back 
and write an MAB to debug my DQN agent.  But there was a lack 
of native gymnasium environments, so I wrote Buffalo, an easy-to-use 
 environment that it might help someone else.

## Buffalo ("Buffalo-v0" | "Bandit-v0")

Default multi-armed bandit environment.  Arm center values 
are drawn from a normal distribution (0, arms).  When an 
arm is pulled, a random value is drawn from a normal 
distribution (0, 1) and added to the chosen arm center 
value.  This is not intended to be challenging for an agent but 
easy for the debugger to reason about.

## Multi-Buffalo ("MultiBuffalo-v0" | "ContextualBandit-v0")

This serves as a contextual bandit implementation.  It is a 
k-armed bandit with n states.  These states are indicated to 
the agent in the observation and the two states have different 
reward offsets for each arm.  The goal of the agent is to 
learn and contextualize best action for a given state.  This is 
a good stepping stone to Markov Decision Processes.

This module had an extra parameter, pace.  By default (None), a 
new state is chosen for every step of the environment.  It can 
be set to any integer to determine how many steps between randomly 
choosing a new state.  Of course, transitioning to a new state is 
not guaranteed as the next state is random.

## Buffalo Trail ("BuffaloTrail-v0" | "StatefulBandit-v0")

This serves as a stateful bandit implementation.  There is a 
pervasive rumor that slot machine manufacturers put in 
a secret sequence of bets which trigger a large reward or the 
jackpot.  It is almost certainly not true in the real world but 
it is here.  A sequence of actions gives the max reward.  The 
sequence is randomly chosen on environment setup and indicated 
in the info of reset.  Not all sequences are aliased and this 
may be an important thing to check in an implementation.  Therefore, 
there is a rudimentary algorithm to force aliasing included.

## Using

Install via pip and import buffalo_gym along with gymnasium.

```
import gymnasium  
import buffalo_gym

env = gym.make("Buffalo-v0")
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "buffalo-gym",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "gymnasium, gym",
    "author": "foreverska",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/c0/d3/fe79259b6f5ecbfaffc79771bc46ce37c2be255b0a63a8b4254ce33fa9ee/buffalo_gym-0.1.0.tar.gz",
    "platform": null,
    "description": "# Buffalo Gym\n\nA multi-armed bandit (MAB) environment for the gymnasium API.\nOne-armed Bandit is a reference to slot machines, and Buffalo \nis a reference to one such slot machine that I am fond \nof.  MABs are an excellent playground for theoretical exercise and \ndebugging of RL agents as they provide an environment that \ncan be reasoned about easily.  It helped me once to step back \nand write an MAB to debug my DQN agent.  But there was a lack \nof native gymnasium environments, so I wrote Buffalo, an easy-to-use \n environment that it might help someone else.\n\n## Buffalo (\"Buffalo-v0\" | \"Bandit-v0\")\n\nDefault multi-armed bandit environment.  Arm center values \nare drawn from a normal distribution (0, arms).  When an \narm is pulled, a random value is drawn from a normal \ndistribution (0, 1) and added to the chosen arm center \nvalue.  This is not intended to be challenging for an agent but \neasy for the debugger to reason about.\n\n## Multi-Buffalo (\"MultiBuffalo-v0\" | \"ContextualBandit-v0\")\n\nThis serves as a contextual bandit implementation.  It is a \nk-armed bandit with n states.  These states are indicated to \nthe agent in the observation and the two states have different \nreward offsets for each arm.  The goal of the agent is to \nlearn and contextualize best action for a given state.  This is \na good stepping stone to Markov Decision Processes.\n\nThis module had an extra parameter, pace.  By default (None), a \nnew state is chosen for every step of the environment.  It can \nbe set to any integer to determine how many steps between randomly \nchoosing a new state.  Of course, transitioning to a new state is \nnot guaranteed as the next state is random.\n\n## Buffalo Trail (\"BuffaloTrail-v0\" | \"StatefulBandit-v0\")\n\nThis serves as a stateful bandit implementation.  There is a \npervasive rumor that slot machine manufacturers put in \na secret sequence of bets which trigger a large reward or the \njackpot.  It is almost certainly not true in the real world but \nit is here.  A sequence of actions gives the max reward.  The \nsequence is randomly chosen on environment setup and indicated \nin the info of reset.  Not all sequences are aliased and this \nmay be an important thing to check in an implementation.  Therefore, \nthere is a rudimentary algorithm to force aliasing included.\n\n## Using\n\nInstall via pip and import buffalo_gym along with gymnasium.\n\n```\nimport gymnasium  \nimport buffalo_gym\n\nenv = gym.make(\"Buffalo-v0\")\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Buffalo Gym environment",
    "version": "0.1.0",
    "project_urls": {
        "Github:": "https://github.com/foreverska/buffalo-gym"
    },
    "split_keywords": [
        "gymnasium",
        " gym"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1713371decbc71b963f6765e93fcfa278d54352488976591179ed804d7b00635",
                "md5": "dd4431bc94463cd4e2a98854540a6583",
                "sha256": "389d3593d16dfd24145c1d59fbeb02c170b46e4e8a03f4558ac4a4d7040d3c3f"
            },
            "downloads": -1,
            "filename": "buffalo_gym-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "dd4431bc94463cd4e2a98854540a6583",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 8835,
            "upload_time": "2024-11-30T03:41:48",
            "upload_time_iso_8601": "2024-11-30T03:41:48.457816Z",
            "url": "https://files.pythonhosted.org/packages/17/13/371decbc71b963f6765e93fcfa278d54352488976591179ed804d7b00635/buffalo_gym-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c0d3fe79259b6f5ecbfaffc79771bc46ce37c2be255b0a63a8b4254ce33fa9ee",
                "md5": "a39b0b474843e08d4e52e9d425824f68",
                "sha256": "695973619eb6e382e7be7f9f8609db00ee4c75d177877aa5a9308079c38d8e48"
            },
            "downloads": -1,
            "filename": "buffalo_gym-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a39b0b474843e08d4e52e9d425824f68",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 7492,
            "upload_time": "2024-11-30T03:41:59",
            "upload_time_iso_8601": "2024-11-30T03:41:59.130574Z",
            "url": "https://files.pythonhosted.org/packages/c0/d3/fe79259b6f5ecbfaffc79771bc46ce37c2be255b0a63a8b4254ce33fa9ee/buffalo_gym-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-30 03:41:59",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "foreverska",
    "github_project": "buffalo-gym",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "buffalo-gym"
}
        
Elapsed time: 0.38508s