buffalo-gym

Name	buffalo-gym JSON
Version	0.2.0 JSON
	download
home_page	None
Summary	Buffalo Gym environment
upload_time	2024-12-26 19:26:40
maintainer	None
docs_url	None
author	foreverska
requires_python	None
license	None
keywords	gymnasium gym
VCS
bugtrack_url
requirements	gymnasium numpy setuptools
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Buffalo Gym

A multi-armed bandit (MAB) environment for the gymnasium API.
One-armed Bandit is a reference to slot machines, and Buffalo 
is a reference to one such slot machine that I am fond 
of.  MABs are an excellent playground for theoretical exercise and 
debugging of RL agents as they provide an environment that 
can be reasoned about easily.  It helped me once to step back 
and write an MAB to debug my DQN agent.  But there was a lack 
of native gymnasium environments, so I wrote Buffalo, an easy-to-use 
 environment that it might help someone else.

## Standard Bandit Problems

### Buffalo ("Buffalo-v0" | "Bandit-v0")

Default multi-armed bandit environment.  Arm center values 
are drawn from a normal distribution (0, arms).  When an 
arm is pulled, a random value is drawn from a normal 
distribution (0, 1) and added to the chosen arm center 
value.  This is not intended to be challenging for an agent but 
easy for the debugger to reason about.

### Multi-Buffalo ("MultiBuffalo-v0" | "ContextualBandit-v0")

This serves as a contextual bandit implementation.  It is a 
k-armed bandit with n states.  These states are indicated to 
the agent in the observation and the two states have different 
reward offsets for each arm.  The goal of the agent is to 
learn and contextualize best action for a given state.  This is 
a good stepping stone to Markov Decision Processes.

This module had an extra parameter, pace.  By default (None), a 
new state is chosen for every step of the environment.  It can 
be set to any integer to determine how many steps between randomly 
choosing a new state.  Of course, transitioning to a new state is 
not guaranteed as the next state is random.

### DuelingBuffalo ("DuelingBuffalo-v0" | "DuelingBandit-v0")

Yue et al. (2012) introduced the dueling bandit variant to model 
situations with only relative feedback.  The agent pulls two levers 
simultaneously; the feedback is whichever lever provides the best 
reward.  This restriction means the agent cannot observe rewards 
and must continually compare arms to determine the best.  Given 
the reward-centric structure of gymnasium returns, we instead 
give a reward of 1 if the first arm chosen was higher than the 
second.  The agent must choose two arms, which cannot be the same.

### BoundlessBuffalo ("BoundlessBuffalo-v0" | "InfiniteArmedBandit-v0")

Built from the Wikipedia entry based on Agrawal, 1995 (Paywalled), 
BoundlessBuffalo approximates the InfiniteArmedBandit problem.  
The reward for this bandit is the action put into a polynomial of 
degree n, with the coefficients randomly sampled from (-0.1, 0.1).  
This environment tests the ability of an algorithm to find an optimal 
input in a continuous space.  The dynamic drawing of new coefficients 
challenges algorithms to adapt to a changing landscape continually.

## Nonstandard Bandit Problems

### Buffalo Trail ("BuffaloTrail-v0" | "StatefulBandit-v0")

A Stateful Bandit builds on the Contextual Bandit by relaxing 
the assumption that rewards depend only on the current state. 
In this framework, the environment incorporates a memory of past 
states, rewarding the maximum to an agent only if it encounters a 
specific sequence of states and selects the correct action.

This setup isolates an agent's ability to track history and infer 
belief states, without introducing the confounding factor of 
exploration, as the agent cannot control state transitions. Stateful 
Bandits provide a targeted environment for studying history-dependent 
decision-making and state estimation.

### Symbolic State ("SymbolicStateBandit-v0")

In real slots, the state of the bandit has little to no impact on 
the underlying rewards.  Plenty of flashing lights and game modes 
serve only to keep the player engaged.  This SymbolicStateBandit 
(SSB) formulation simulates this.  The states do not correlate 
with the underlying rewards in this contextual bandit.

By setting dynamic_rate to None, the rewards are always the same 
despite the changing states; dynamic_rate == pace randomly changes 
the arms with each state, and any other values produce further 
uncorrelated behavior.  This configuration serves as a test bed for 
the "worst case" scenario for a bandit/reinforcement learner.  It 
measures the agent's ability to generalize well and/or how it performs 
when the environment breaks the typical assumptions.

## Using

Install via pip and import buffalo_gym along with gymnasium.

```
import gymnasium  
import buffalo_gym

env = gym.make("Buffalo-v0")
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "buffalo-gym",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "gymnasium, gym",
    "author": "foreverska",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/75/a1/6b484625a41744ff6528553d05cd6cc8b0782804dbe506f856b2ff5944d1/buffalo_gym-0.2.0.tar.gz",
    "platform": null,
    "description": "# Buffalo Gym\n\nA multi-armed bandit (MAB) environment for the gymnasium API.\nOne-armed Bandit is a reference to slot machines, and Buffalo \nis a reference to one such slot machine that I am fond \nof.  MABs are an excellent playground for theoretical exercise and \ndebugging of RL agents as they provide an environment that \ncan be reasoned about easily.  It helped me once to step back \nand write an MAB to debug my DQN agent.  But there was a lack \nof native gymnasium environments, so I wrote Buffalo, an easy-to-use \n environment that it might help someone else.\n\n## Standard Bandit Problems\n\n### Buffalo (\"Buffalo-v0\" | \"Bandit-v0\")\n\nDefault multi-armed bandit environment.  Arm center values \nare drawn from a normal distribution (0, arms).  When an \narm is pulled, a random value is drawn from a normal \ndistribution (0, 1) and added to the chosen arm center \nvalue.  This is not intended to be challenging for an agent but \neasy for the debugger to reason about.\n\n### Multi-Buffalo (\"MultiBuffalo-v0\" | \"ContextualBandit-v0\")\n\nThis serves as a contextual bandit implementation.  It is a \nk-armed bandit with n states.  These states are indicated to \nthe agent in the observation and the two states have different \nreward offsets for each arm.  The goal of the agent is to \nlearn and contextualize best action for a given state.  This is \na good stepping stone to Markov Decision Processes.\n\nThis module had an extra parameter, pace.  By default (None), a \nnew state is chosen for every step of the environment.  It can \nbe set to any integer to determine how many steps between randomly \nchoosing a new state.  Of course, transitioning to a new state is \nnot guaranteed as the next state is random.\n\n### DuelingBuffalo (\"DuelingBuffalo-v0\" | \"DuelingBandit-v0\")\n\nYue et al. (2012) introduced the dueling bandit variant to model \nsituations with only relative feedback.  The agent pulls two levers \nsimultaneously; the feedback is whichever lever provides the best \nreward.  This restriction means the agent cannot observe rewards \nand must continually compare arms to determine the best.  Given \nthe reward-centric structure of gymnasium returns, we instead \ngive a reward of 1 if the first arm chosen was higher than the \nsecond.  The agent must choose two arms, which cannot be the same.\n\n### BoundlessBuffalo (\"BoundlessBuffalo-v0\" | \"InfiniteArmedBandit-v0\")\n\nBuilt from the Wikipedia entry based on Agrawal, 1995 (Paywalled), \nBoundlessBuffalo approximates the InfiniteArmedBandit problem.  \nThe reward for this bandit is the action put into a polynomial of \ndegree n, with the coefficients randomly sampled from (-0.1, 0.1).  \nThis environment tests the ability of an algorithm to find an optimal \ninput in a continuous space.  The dynamic drawing of new coefficients \nchallenges algorithms to adapt to a changing landscape continually.\n\n## Nonstandard Bandit Problems\n\n### Buffalo Trail (\"BuffaloTrail-v0\" | \"StatefulBandit-v0\")\n\nA Stateful Bandit builds on the Contextual Bandit by relaxing \nthe assumption that rewards depend only on the current state. \nIn this framework, the environment incorporates a memory of past \nstates, rewarding the maximum to an agent only if it encounters a \nspecific sequence of states and selects the correct action.\n\nThis setup isolates an agent's ability to track history and infer \nbelief states, without introducing the confounding factor of \nexploration, as the agent cannot control state transitions. Stateful \nBandits provide a targeted environment for studying history-dependent \ndecision-making and state estimation.\n\n### Symbolic State (\"SymbolicStateBandit-v0\")\n\nIn real slots, the state of the bandit has little to no impact on \nthe underlying rewards.  Plenty of flashing lights and game modes \nserve only to keep the player engaged.  This SymbolicStateBandit \n(SSB) formulation simulates this.  The states do not correlate \nwith the underlying rewards in this contextual bandit.\n\nBy setting dynamic_rate to None, the rewards are always the same \ndespite the changing states; dynamic_rate == pace randomly changes \nthe arms with each state, and any other values produce further \nuncorrelated behavior.  This configuration serves as a test bed for \nthe \"worst case\" scenario for a bandit/reinforcement learner.  It \nmeasures the agent's ability to generalize well and/or how it performs \nwhen the environment breaks the typical assumptions.\n\n## Using\n\nInstall via pip and import buffalo_gym along with gymnasium.\n\n```\nimport gymnasium  \nimport buffalo_gym\n\nenv = gym.make(\"Buffalo-v0\")\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Buffalo Gym environment",
    "version": "0.2.0",
    "project_urls": {
        "Github:": "https://github.com/foreverska/buffalo-gym"
    },
    "split_keywords": [
        "gymnasium",
        " gym"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "24d1102a6b1d0018b5dfda4d7cdda063316477c392ab8381c57fe571846403c6",
                "md5": "3c1b456759a4827c5f1bd54e8bdd5360",
                "sha256": "af2382cb5b0ba5f78f04f2ff7752da27a64d24cd542cdcdb9e86f4cc415100a6"
            },
            "downloads": -1,
            "filename": "buffalo_gym-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3c1b456759a4827c5f1bd54e8bdd5360",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 14085,
            "upload_time": "2024-12-26T19:26:39",
            "upload_time_iso_8601": "2024-12-26T19:26:39.946780Z",
            "url": "https://files.pythonhosted.org/packages/24/d1/102a6b1d0018b5dfda4d7cdda063316477c392ab8381c57fe571846403c6/buffalo_gym-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "75a16b484625a41744ff6528553d05cd6cc8b0782804dbe506f856b2ff5944d1",
                "md5": "00142ea6999430b98d7ac2ef6c4970d9",
                "sha256": "7802a6c140ae17742c61de5b1994ad989e1cf37c362ed9aaac94ae723d233b5e"
            },
            "downloads": -1,
            "filename": "buffalo_gym-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "00142ea6999430b98d7ac2ef6c4970d9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 10974,
            "upload_time": "2024-12-26T19:26:40",
            "upload_time_iso_8601": "2024-12-26T19:26:40.949556Z",
            "url": "https://files.pythonhosted.org/packages/75/a1/6b484625a41744ff6528553d05cd6cc8b0782804dbe506f856b2ff5944d1/buffalo_gym-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-26 19:26:40",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "foreverska",
    "github_project": "buffalo-gym",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "gymnasium",
            "specs": [
                [
                    "~=",
                    "0.29.1"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "~=",
                    "1.24.3"
                ]
            ]
        },
        {
            "name": "setuptools",
            "specs": [
                [
                    ">=",
                    "70.0.0"
                ]
            ]
        }
    ],
    "lcname": "buffalo-gym"
}

foreverska