softmax-exploration

Name	softmax-exploration JSON
Version	0.1 JSON
	download
home_page	https://github.com/yourusername/softmax-exploration
Summary	Softmax exploration functions for reinforcement learning
upload_time	2025-08-11 03:59:39
maintainer	None
docs_url	None
author	Your Name
requires_python	>=3.6
license	None
keywords	reinforcement-learning softmax exploration boltzmann q-learning
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Softmax Exploration Package

A Python package for implementing softmax exploration strategies in reinforcement learning algorithms.

## Installation

```bash
pip install softmax-exploration
```

## Features

- **Softmax Action Selection**: Convert Q-values to action probabilities using softmax function
- **Boltzmann Exploration**: Temperature-controlled exploration strategy
- **Epsilon-Softmax**: Hybrid approach combining epsilon-greedy with softmax
- **Adaptive Temperature**: Dynamic temperature scheduling for exploration decay
- **Numerical Stability**: Robust implementation with overflow protection

## Usage

### Basic Softmax Exploration

```python
from softmax_exploration import softmax, softmax_action_selection

# Q-values for each action
q_values = [1.2, 0.8, 2.1, 0.5]

# Get action probabilities
probabilities = softmax(q_values, temperature=1.0)
print(probabilities)
# Output: [0.234, 0.156, 0.456, 0.154]

# Select action using softmax
action = softmax_action_selection(q_values, temperature=1.0)
print(f"Selected action: {action}")
```

### Temperature Control

```python
# High temperature = more exploration
probs_high_temp = softmax(q_values, temperature=2.0)
print("High temperature (more exploration):", probs_high_temp)

# Low temperature = more exploitation
probs_low_temp = softmax(q_values, temperature=0.5)
print("Low temperature (more exploitation):", probs_low_temp)
```

### Epsilon-Softmax Hybrid

```python
from softmax_exploration import epsilon_softmax

# Combine epsilon-greedy with softmax
action = epsilon_softmax(q_values, epsilon=0.1, temperature=1.0)
print(f"Epsilon-softmax action: {action}")
```

### Adaptive Temperature Scheduling

```python
from softmax_exploration import adaptive_temperature

# Temperature decreases over episodes
for episode in [0, 10, 50, 100]:
    temp = adaptive_temperature(episode)
    print(f"Episode {episode}: Temperature = {temp:.3f}")
```

### Boltzmann Exploration

```python
from softmax_exploration import boltzmann_exploration

# Boltzmann exploration (same as softmax)
action = boltzmann_exploration(q_values, temperature=1.0)
print(f"Boltzmann action: {action}")
```

## API Reference

### `softmax(q_values, temperature=1.0)`
Compute softmax probabilities for given Q-values.

**Parameters:**
- `q_values`: List or numpy array of Q-values
- `temperature`: Temperature parameter (higher = more exploration)

**Returns:** Probability distribution over actions

### `softmax_action_selection(q_values, temperature=1.0, random_state=None)`
Select an action using softmax exploration.

**Parameters:**
- `q_values`: List or numpy array of Q-values
- `temperature`: Temperature parameter
- `random_state`: Random state for reproducibility

**Returns:** Selected action index

### `epsilon_softmax(q_values, epsilon=0.1, temperature=1.0, random_state=None)`
Hybrid exploration combining epsilon-greedy with softmax.

**Parameters:**
- `q_values`: List or numpy array of Q-values
- `epsilon`: Probability of random action selection
- `temperature`: Temperature parameter for softmax
- `random_state`: Random state for reproducibility

**Returns:** Selected action index

### `adaptive_temperature(episode, initial_temp=10.0, decay_rate=0.995, min_temp=0.1)`
Compute adaptive temperature for exploration scheduling.

**Parameters:**
- `episode`: Current episode number
- `initial_temp`: Initial temperature value
- `decay_rate`: Temperature decay rate
- `min_temp`: Minimum temperature value

**Returns:** Adaptive temperature value

## Requirements

- Python 3.6+
- NumPy

## Installation from Source

```bash
git clone https://github.com/yourusername/softmax-exploration.git
cd softmax-exploration
pip install -e .
```

## License

This project is open source and available under the MIT License.

## Contributing

Feel free to contribute to this project by submitting issues or pull requests.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/yourusername/softmax-exploration",
    "name": "softmax-exploration",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "reinforcement-learning, softmax, exploration, boltzmann, q-learning",
    "author": "Your Name",
    "author_email": "your.email@example.com",
    "download_url": "https://files.pythonhosted.org/packages/70/d6/371111e81ce8eed585d44040bb5728e31579448250f08c2ff22270c7cad2/softmax_exploration-0.1.tar.gz",
    "platform": null,
    "description": "# Softmax Exploration Package\n\nA Python package for implementing softmax exploration strategies in reinforcement learning algorithms.\n\n## Installation\n\n```bash\npip install softmax-exploration\n```\n\n## Features\n\n- **Softmax Action Selection**: Convert Q-values to action probabilities using softmax function\n- **Boltzmann Exploration**: Temperature-controlled exploration strategy\n- **Epsilon-Softmax**: Hybrid approach combining epsilon-greedy with softmax\n- **Adaptive Temperature**: Dynamic temperature scheduling for exploration decay\n- **Numerical Stability**: Robust implementation with overflow protection\n\n## Usage\n\n### Basic Softmax Exploration\n\n```python\nfrom softmax_exploration import softmax, softmax_action_selection\n\n# Q-values for each action\nq_values = [1.2, 0.8, 2.1, 0.5]\n\n# Get action probabilities\nprobabilities = softmax(q_values, temperature=1.0)\nprint(probabilities)\n# Output: [0.234, 0.156, 0.456, 0.154]\n\n# Select action using softmax\naction = softmax_action_selection(q_values, temperature=1.0)\nprint(f\"Selected action: {action}\")\n```\n\n### Temperature Control\n\n```python\n# High temperature = more exploration\nprobs_high_temp = softmax(q_values, temperature=2.0)\nprint(\"High temperature (more exploration):\", probs_high_temp)\n\n# Low temperature = more exploitation\nprobs_low_temp = softmax(q_values, temperature=0.5)\nprint(\"Low temperature (more exploitation):\", probs_low_temp)\n```\n\n### Epsilon-Softmax Hybrid\n\n```python\nfrom softmax_exploration import epsilon_softmax\n\n# Combine epsilon-greedy with softmax\naction = epsilon_softmax(q_values, epsilon=0.1, temperature=1.0)\nprint(f\"Epsilon-softmax action: {action}\")\n```\n\n### Adaptive Temperature Scheduling\n\n```python\nfrom softmax_exploration import adaptive_temperature\n\n# Temperature decreases over episodes\nfor episode in [0, 10, 50, 100]:\n    temp = adaptive_temperature(episode)\n    print(f\"Episode {episode}: Temperature = {temp:.3f}\")\n```\n\n### Boltzmann Exploration\n\n```python\nfrom softmax_exploration import boltzmann_exploration\n\n# Boltzmann exploration (same as softmax)\naction = boltzmann_exploration(q_values, temperature=1.0)\nprint(f\"Boltzmann action: {action}\")\n```\n\n## API Reference\n\n### `softmax(q_values, temperature=1.0)`\nCompute softmax probabilities for given Q-values.\n\n**Parameters:**\n- `q_values`: List or numpy array of Q-values\n- `temperature`: Temperature parameter (higher = more exploration)\n\n**Returns:** Probability distribution over actions\n\n### `softmax_action_selection(q_values, temperature=1.0, random_state=None)`\nSelect an action using softmax exploration.\n\n**Parameters:**\n- `q_values`: List or numpy array of Q-values\n- `temperature`: Temperature parameter\n- `random_state`: Random state for reproducibility\n\n**Returns:** Selected action index\n\n### `epsilon_softmax(q_values, epsilon=0.1, temperature=1.0, random_state=None)`\nHybrid exploration combining epsilon-greedy with softmax.\n\n**Parameters:**\n- `q_values`: List or numpy array of Q-values\n- `epsilon`: Probability of random action selection\n- `temperature`: Temperature parameter for softmax\n- `random_state`: Random state for reproducibility\n\n**Returns:** Selected action index\n\n### `adaptive_temperature(episode, initial_temp=10.0, decay_rate=0.995, min_temp=0.1)`\nCompute adaptive temperature for exploration scheduling.\n\n**Parameters:**\n- `episode`: Current episode number\n- `initial_temp`: Initial temperature value\n- `decay_rate`: Temperature decay rate\n- `min_temp`: Minimum temperature value\n\n**Returns:** Adaptive temperature value\n\n## Requirements\n\n- Python 3.6+\n- NumPy\n\n## Installation from Source\n\n```bash\ngit clone https://github.com/yourusername/softmax-exploration.git\ncd softmax-exploration\npip install -e .\n```\n\n## License\n\nThis project is open source and available under the MIT License.\n\n## Contributing\n\nFeel free to contribute to this project by submitting issues or pull requests.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Softmax exploration functions for reinforcement learning",
    "version": "0.1",
    "project_urls": {
        "Homepage": "https://github.com/yourusername/softmax-exploration"
    },
    "split_keywords": [
        "reinforcement-learning",
        " softmax",
        " exploration",
        " boltzmann",
        " q-learning"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "af0fe43914c1b779ba336614e65cd674ca0f7b35c2b2e36b73c1f304a242be0c",
                "md5": "ad6bf529adf5619112f97020b6c995a5",
                "sha256": "639eb58cd29590202800c0536d9365359a85c2822376571126dfc6743e29d6dd"
            },
            "downloads": -1,
            "filename": "softmax_exploration-0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ad6bf529adf5619112f97020b6c995a5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 3805,
            "upload_time": "2025-08-11T03:59:37",
            "upload_time_iso_8601": "2025-08-11T03:59:37.163172Z",
            "url": "https://files.pythonhosted.org/packages/af/0f/e43914c1b779ba336614e65cd674ca0f7b35c2b2e36b73c1f304a242be0c/softmax_exploration-0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "70d6371111e81ce8eed585d44040bb5728e31579448250f08c2ff22270c7cad2",
                "md5": "849b3bf26780baa3bc0c96b8f43f6620",
                "sha256": "9d07179adead69f957d01dcce2a48883a8115ceef47c3d23b3efdf61f37a028e"
            },
            "downloads": -1,
            "filename": "softmax_exploration-0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "849b3bf26780baa3bc0c96b8f43f6620",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 3851,
            "upload_time": "2025-08-11T03:59:39",
            "upload_time_iso_8601": "2025-08-11T03:59:39.241904Z",
            "url": "https://files.pythonhosted.org/packages/70/d6/371111e81ce8eed585d44040bb5728e31579448250f08c2ff22270c7cad2/softmax_exploration-0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-11 03:59:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yourusername",
    "github_project": "softmax-exploration",
    "github_not_found": true,
    "lcname": "softmax-exploration"
}

Your Name