hindsight-replay


Namehindsight-replay JSON
Version 0.0.1 PyPI version JSON
download
home_pagehttps://github.com/kyegomez/HindsightReplay
SummaryHindsight - Pytorch
upload_time2023-10-25 02:41:47
maintainer
docs_urlNone
authorKye Gomez
requires_python>=3.6,<4.0
licenseMIT
keywords artificial intelligence deep learning optimizers prompt engineering
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)

# Hindsight Experience Replay (HER)
=================================

Hindsight Experience Replay (HER) is a reinforcement learning technique that makes use of failed experiences to learn how to achieve goals. It does this by storing additional transitions in the replay buffer where the goal is replaced with the achieved state. This allows the agent to learn from a hindsight perspective, as if it had intended to reach the achieved state from the beginning.

## Implementation
--------------

This repository contains a Python implementation of HER using PyTorch. The main class is `HindsightExperienceReplay`, which represents a replay buffer that stores transitions and allows for sampling mini-batches of transitions.

The `HindsightExperienceReplay` class takes the following arguments:

-   `state_dim`: The dimension of the state space.
-   `action_dim`: The dimension of the action space.
-   `buffer_size`: The maximum size of the replay buffer.
-   `batch_size`: The size of the mini-batches to sample.
-   `goal_sampling_strategy`: A function that takes a tensor of goals and returns a tensor of goals. This function is used to dynamically sample goals for replay.

The `HindsightExperienceReplay` class has the following methods:

-   `store_transition(state, action, reward, next_state, done, goal)`: Stores a transition and an additional transition where the goal is replaced with the achieved state in the replay buffer.
-   `sample()`: Samples a mini-batch of transitions from the replay buffer and applies the goal sampling strategy to the goals.
-   `__len__()`: Returns the current size of the replay buffer.

## Usage
-----

Here is an example of how to use the `HindsightExperienceReplay` class:

```python
# Define a goal sampling strategy
def goal_sampling_strategy(goals):
    noise = torch.randn_like(goals) * 0.1
    return goals + noise

# Define the dimensions of the state and action spaces, the buffer size, and the batch size
state_dim = 10
action_dim = 2
buffer_size = 10000
batch_size = 64

# Create an instance of the HindsightExperienceReplay class
her = HindsightExperienceReplay(state_dim, action_dim, buffer_size, batch_size, goal_sampling_strategy)

# Store a transition
state = np.random.rand(state_dim)
action = np.random.rand(action_dim)
reward = np.random.rand()
next_state = np.random.rand(state_dim)
done = False
goal = np.random.rand(state_dim)
her.store_transition(state, action, reward, next_state, done, goal)

# Sample a mini-batch of transitions
sampled_transitions = her.sample()
if sampled_transitions is not None:
    states, actions, rewards, next_states, dones, goals = sampled_transitions
```


In this example, we first define a goal sampling strategy function and the dimensions of the state and action spaces, the buffer size, and the batch size. We then create an instance of the `HindsightExperienceReplay` class, store a transition, and sample a mini-batch of transitions. The states, actions, rewards, next states, done flags, and goals are returned as separate tensors.

## Customizing the Goal Sampling Strategy
--------------------------------------

The `HindsightExperienceReplay` class allows you to define your own goal sampling strategy by passing a function to the constructor. This function should take a tensor of goals and return a tensor of goals.

Here is an example of a goal sampling strategy function that adds random noise to the goals:

```
def goal_sampling_strategy(goals):
    noise = torch.randn_like(goals) * 0.1
    return goals + noise
```

In this example, the function adds Gaussian noise with a standard deviation of 0.1 to the goals. You can customize this function to implement any goal sampling strategy that suits your needs.

## Contributing
------------

Contributions to this project are welcome. If you find a bug or think of a feature that would be nice to have, please open an issue. If you want to contribute code, please fork the repository and submit a pull request.

## License
-------

This project is licensed under the MIT License. See the [LICENSE](https://domain.apac.ai/LICENSE) file for details.
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kyegomez/HindsightReplay",
    "name": "hindsight-replay",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6,<4.0",
    "maintainer_email": "",
    "keywords": "artificial intelligence,deep learning,optimizers,Prompt Engineering",
    "author": "Kye Gomez",
    "author_email": "kye@apac.ai",
    "download_url": "https://files.pythonhosted.org/packages/07/f4/89c9d282316af61b0efae729a44bcc132db6d0f74e25bdd5bc666f9affbf/hindsight_replay-0.0.1.tar.gz",
    "platform": null,
    "description": "[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)\n\n# Hindsight Experience Replay (HER)\n=================================\n\nHindsight Experience Replay (HER) is a reinforcement learning technique that makes use of failed experiences to learn how to achieve goals. It does this by storing additional transitions in the replay buffer where the goal is replaced with the achieved state. This allows the agent to learn from a hindsight perspective, as if it had intended to reach the achieved state from the beginning.\n\n## Implementation\n--------------\n\nThis repository contains a Python implementation of HER using PyTorch. The main class is\u00a0`HindsightExperienceReplay`, which represents a replay buffer that stores transitions and allows for sampling mini-batches of transitions.\n\nThe\u00a0`HindsightExperienceReplay`\u00a0class takes the following arguments:\n\n-   `state_dim`: The dimension of the state space.\n-   `action_dim`: The dimension of the action space.\n-   `buffer_size`: The maximum size of the replay buffer.\n-   `batch_size`: The size of the mini-batches to sample.\n-   `goal_sampling_strategy`: A function that takes a tensor of goals and returns a tensor of goals. This function is used to dynamically sample goals for replay.\n\nThe\u00a0`HindsightExperienceReplay`\u00a0class has the following methods:\n\n-   `store_transition(state, action, reward, next_state, done, goal)`: Stores a transition and an additional transition where the goal is replaced with the achieved state in the replay buffer.\n-   `sample()`: Samples a mini-batch of transitions from the replay buffer and applies the goal sampling strategy to the goals.\n-   `__len__()`: Returns the current size of the replay buffer.\n\n## Usage\n-----\n\nHere is an example of how to use the\u00a0`HindsightExperienceReplay`\u00a0class:\n\n```python\n# Define a goal sampling strategy\ndef goal_sampling_strategy(goals):\n    noise = torch.randn_like(goals) * 0.1\n    return goals + noise\n\n# Define the dimensions of the state and action spaces, the buffer size, and the batch size\nstate_dim = 10\naction_dim = 2\nbuffer_size = 10000\nbatch_size = 64\n\n# Create an instance of the HindsightExperienceReplay class\nher = HindsightExperienceReplay(state_dim, action_dim, buffer_size, batch_size, goal_sampling_strategy)\n\n# Store a transition\nstate = np.random.rand(state_dim)\naction = np.random.rand(action_dim)\nreward = np.random.rand()\nnext_state = np.random.rand(state_dim)\ndone = False\ngoal = np.random.rand(state_dim)\nher.store_transition(state, action, reward, next_state, done, goal)\n\n# Sample a mini-batch of transitions\nsampled_transitions = her.sample()\nif sampled_transitions is not None:\n    states, actions, rewards, next_states, dones, goals = sampled_transitions\n```\n\n\nIn this example, we first define a goal sampling strategy function and the dimensions of the state and action spaces, the buffer size, and the batch size. We then create an instance of the\u00a0`HindsightExperienceReplay`\u00a0class, store a transition, and sample a mini-batch of transitions. The states, actions, rewards, next states, done flags, and goals are returned as separate tensors.\n\n## Customizing the Goal Sampling Strategy\n--------------------------------------\n\nThe\u00a0`HindsightExperienceReplay`\u00a0class allows you to define your own goal sampling strategy by passing a function to the constructor. This function should take a tensor of goals and return a tensor of goals.\n\nHere is an example of a goal sampling strategy function that adds random noise to the goals:\n\n```\ndef goal_sampling_strategy(goals):\n    noise = torch.randn_like(goals) * 0.1\n    return goals + noise\n```\n\nIn this example, the function adds Gaussian noise with a standard deviation of 0.1 to the goals. You can customize this function to implement any goal sampling strategy that suits your needs.\n\n## Contributing\n------------\n\nContributions to this project are welcome. If you find a bug or think of a feature that would be nice to have, please open an issue. If you want to contribute code, please fork the repository and submit a pull request.\n\n## License\n-------\n\nThis project is licensed under the MIT License. See the\u00a0[LICENSE](https://domain.apac.ai/LICENSE)\u00a0file for details.",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Hindsight - Pytorch",
    "version": "0.0.1",
    "project_urls": {
        "Homepage": "https://github.com/kyegomez/HindsightReplay",
        "Repository": "https://github.com/kyegomez/HindsightReplay"
    },
    "split_keywords": [
        "artificial intelligence",
        "deep learning",
        "optimizers",
        "prompt engineering"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "48fa5a5b823bd5ab6bd4eec55124049704bdcf2a55e7a57a5f9f1256ea13d48c",
                "md5": "b374b74e162c3a41c034fb991e9e95b1",
                "sha256": "cf45660f941511154e11011a287c8e88219abed8e4764ffff65c9c11bf0753ca"
            },
            "downloads": -1,
            "filename": "hindsight_replay-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b374b74e162c3a41c034fb991e9e95b1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6,<4.0",
            "size": 4310,
            "upload_time": "2023-10-25T02:41:45",
            "upload_time_iso_8601": "2023-10-25T02:41:45.989490Z",
            "url": "https://files.pythonhosted.org/packages/48/fa/5a5b823bd5ab6bd4eec55124049704bdcf2a55e7a57a5f9f1256ea13d48c/hindsight_replay-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "07f489c9d282316af61b0efae729a44bcc132db6d0f74e25bdd5bc666f9affbf",
                "md5": "766dc9b03af95e28cc8a372e027418f9",
                "sha256": "c4137f932e666c91bf9391da3fd0cc84f05bafe7c057d29df03e3cf51177e2cb"
            },
            "downloads": -1,
            "filename": "hindsight_replay-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "766dc9b03af95e28cc8a372e027418f9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6,<4.0",
            "size": 4482,
            "upload_time": "2023-10-25T02:41:47",
            "upload_time_iso_8601": "2023-10-25T02:41:47.362892Z",
            "url": "https://files.pythonhosted.org/packages/07/f4/89c9d282316af61b0efae729a44bcc132db6d0f74e25bdd5bc666f9affbf/hindsight_replay-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-25 02:41:47",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kyegomez",
    "github_project": "HindsightReplay",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "hindsight-replay"
}
        
Elapsed time: 0.16839s