# Softmax Exploration Package
A Python package for implementing softmax exploration strategies in reinforcement learning algorithms.
## Installation
```bash
pip install softmax-exploration
```
## Features
- **Softmax Action Selection**: Convert Q-values to action probabilities using softmax function
- **Boltzmann Exploration**: Temperature-controlled exploration strategy
- **Epsilon-Softmax**: Hybrid approach combining epsilon-greedy with softmax
- **Adaptive Temperature**: Dynamic temperature scheduling for exploration decay
- **Numerical Stability**: Robust implementation with overflow protection
## Usage
### Basic Softmax Exploration
```python
from softmax_exploration import softmax, softmax_action_selection
# Q-values for each action
q_values = [1.2, 0.8, 2.1, 0.5]
# Get action probabilities
probabilities = softmax(q_values, temperature=1.0)
print(probabilities)
# Output: [0.234, 0.156, 0.456, 0.154]
# Select action using softmax
action = softmax_action_selection(q_values, temperature=1.0)
print(f"Selected action: {action}")
```
### Temperature Control
```python
# High temperature = more exploration
probs_high_temp = softmax(q_values, temperature=2.0)
print("High temperature (more exploration):", probs_high_temp)
# Low temperature = more exploitation
probs_low_temp = softmax(q_values, temperature=0.5)
print("Low temperature (more exploitation):", probs_low_temp)
```
### Epsilon-Softmax Hybrid
```python
from softmax_exploration import epsilon_softmax
# Combine epsilon-greedy with softmax
action = epsilon_softmax(q_values, epsilon=0.1, temperature=1.0)
print(f"Epsilon-softmax action: {action}")
```
### Adaptive Temperature Scheduling
```python
from softmax_exploration import adaptive_temperature
# Temperature decreases over episodes
for episode in [0, 10, 50, 100]:
temp = adaptive_temperature(episode)
print(f"Episode {episode}: Temperature = {temp:.3f}")
```
### Boltzmann Exploration
```python
from softmax_exploration import boltzmann_exploration
# Boltzmann exploration (same as softmax)
action = boltzmann_exploration(q_values, temperature=1.0)
print(f"Boltzmann action: {action}")
```
## API Reference
### `softmax(q_values, temperature=1.0)`
Compute softmax probabilities for given Q-values.
**Parameters:**
- `q_values`: List or numpy array of Q-values
- `temperature`: Temperature parameter (higher = more exploration)
**Returns:** Probability distribution over actions
### `softmax_action_selection(q_values, temperature=1.0, random_state=None)`
Select an action using softmax exploration.
**Parameters:**
- `q_values`: List or numpy array of Q-values
- `temperature`: Temperature parameter
- `random_state`: Random state for reproducibility
**Returns:** Selected action index
### `epsilon_softmax(q_values, epsilon=0.1, temperature=1.0, random_state=None)`
Hybrid exploration combining epsilon-greedy with softmax.
**Parameters:**
- `q_values`: List or numpy array of Q-values
- `epsilon`: Probability of random action selection
- `temperature`: Temperature parameter for softmax
- `random_state`: Random state for reproducibility
**Returns:** Selected action index
### `adaptive_temperature(episode, initial_temp=10.0, decay_rate=0.995, min_temp=0.1)`
Compute adaptive temperature for exploration scheduling.
**Parameters:**
- `episode`: Current episode number
- `initial_temp`: Initial temperature value
- `decay_rate`: Temperature decay rate
- `min_temp`: Minimum temperature value
**Returns:** Adaptive temperature value
## Requirements
- Python 3.6+
- NumPy
## Installation from Source
```bash
git clone https://github.com/yourusername/softmax-exploration.git
cd softmax-exploration
pip install -e .
```
## License
This project is open source and available under the MIT License.
## Contributing
Feel free to contribute to this project by submitting issues or pull requests.
Raw data
{
"_id": null,
"home_page": "https://github.com/yourusername/softmax-exploration",
"name": "softmax-exploration",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "reinforcement-learning, softmax, exploration, boltzmann, q-learning",
"author": "Your Name",
"author_email": "your.email@example.com",
"download_url": "https://files.pythonhosted.org/packages/70/d6/371111e81ce8eed585d44040bb5728e31579448250f08c2ff22270c7cad2/softmax_exploration-0.1.tar.gz",
"platform": null,
"description": "# Softmax Exploration Package\n\nA Python package for implementing softmax exploration strategies in reinforcement learning algorithms.\n\n## Installation\n\n```bash\npip install softmax-exploration\n```\n\n## Features\n\n- **Softmax Action Selection**: Convert Q-values to action probabilities using softmax function\n- **Boltzmann Exploration**: Temperature-controlled exploration strategy\n- **Epsilon-Softmax**: Hybrid approach combining epsilon-greedy with softmax\n- **Adaptive Temperature**: Dynamic temperature scheduling for exploration decay\n- **Numerical Stability**: Robust implementation with overflow protection\n\n## Usage\n\n### Basic Softmax Exploration\n\n```python\nfrom softmax_exploration import softmax, softmax_action_selection\n\n# Q-values for each action\nq_values = [1.2, 0.8, 2.1, 0.5]\n\n# Get action probabilities\nprobabilities = softmax(q_values, temperature=1.0)\nprint(probabilities)\n# Output: [0.234, 0.156, 0.456, 0.154]\n\n# Select action using softmax\naction = softmax_action_selection(q_values, temperature=1.0)\nprint(f\"Selected action: {action}\")\n```\n\n### Temperature Control\n\n```python\n# High temperature = more exploration\nprobs_high_temp = softmax(q_values, temperature=2.0)\nprint(\"High temperature (more exploration):\", probs_high_temp)\n\n# Low temperature = more exploitation\nprobs_low_temp = softmax(q_values, temperature=0.5)\nprint(\"Low temperature (more exploitation):\", probs_low_temp)\n```\n\n### Epsilon-Softmax Hybrid\n\n```python\nfrom softmax_exploration import epsilon_softmax\n\n# Combine epsilon-greedy with softmax\naction = epsilon_softmax(q_values, epsilon=0.1, temperature=1.0)\nprint(f\"Epsilon-softmax action: {action}\")\n```\n\n### Adaptive Temperature Scheduling\n\n```python\nfrom softmax_exploration import adaptive_temperature\n\n# Temperature decreases over episodes\nfor episode in [0, 10, 50, 100]:\n temp = adaptive_temperature(episode)\n print(f\"Episode {episode}: Temperature = {temp:.3f}\")\n```\n\n### Boltzmann Exploration\n\n```python\nfrom softmax_exploration import boltzmann_exploration\n\n# Boltzmann exploration (same as softmax)\naction = boltzmann_exploration(q_values, temperature=1.0)\nprint(f\"Boltzmann action: {action}\")\n```\n\n## API Reference\n\n### `softmax(q_values, temperature=1.0)`\nCompute softmax probabilities for given Q-values.\n\n**Parameters:**\n- `q_values`: List or numpy array of Q-values\n- `temperature`: Temperature parameter (higher = more exploration)\n\n**Returns:** Probability distribution over actions\n\n### `softmax_action_selection(q_values, temperature=1.0, random_state=None)`\nSelect an action using softmax exploration.\n\n**Parameters:**\n- `q_values`: List or numpy array of Q-values\n- `temperature`: Temperature parameter\n- `random_state`: Random state for reproducibility\n\n**Returns:** Selected action index\n\n### `epsilon_softmax(q_values, epsilon=0.1, temperature=1.0, random_state=None)`\nHybrid exploration combining epsilon-greedy with softmax.\n\n**Parameters:**\n- `q_values`: List or numpy array of Q-values\n- `epsilon`: Probability of random action selection\n- `temperature`: Temperature parameter for softmax\n- `random_state`: Random state for reproducibility\n\n**Returns:** Selected action index\n\n### `adaptive_temperature(episode, initial_temp=10.0, decay_rate=0.995, min_temp=0.1)`\nCompute adaptive temperature for exploration scheduling.\n\n**Parameters:**\n- `episode`: Current episode number\n- `initial_temp`: Initial temperature value\n- `decay_rate`: Temperature decay rate\n- `min_temp`: Minimum temperature value\n\n**Returns:** Adaptive temperature value\n\n## Requirements\n\n- Python 3.6+\n- NumPy\n\n## Installation from Source\n\n```bash\ngit clone https://github.com/yourusername/softmax-exploration.git\ncd softmax-exploration\npip install -e .\n```\n\n## License\n\nThis project is open source and available under the MIT License.\n\n## Contributing\n\nFeel free to contribute to this project by submitting issues or pull requests.\n",
"bugtrack_url": null,
"license": null,
"summary": "Softmax exploration functions for reinforcement learning",
"version": "0.1",
"project_urls": {
"Homepage": "https://github.com/yourusername/softmax-exploration"
},
"split_keywords": [
"reinforcement-learning",
" softmax",
" exploration",
" boltzmann",
" q-learning"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "af0fe43914c1b779ba336614e65cd674ca0f7b35c2b2e36b73c1f304a242be0c",
"md5": "ad6bf529adf5619112f97020b6c995a5",
"sha256": "639eb58cd29590202800c0536d9365359a85c2822376571126dfc6743e29d6dd"
},
"downloads": -1,
"filename": "softmax_exploration-0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ad6bf529adf5619112f97020b6c995a5",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 3805,
"upload_time": "2025-08-11T03:59:37",
"upload_time_iso_8601": "2025-08-11T03:59:37.163172Z",
"url": "https://files.pythonhosted.org/packages/af/0f/e43914c1b779ba336614e65cd674ca0f7b35c2b2e36b73c1f304a242be0c/softmax_exploration-0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "70d6371111e81ce8eed585d44040bb5728e31579448250f08c2ff22270c7cad2",
"md5": "849b3bf26780baa3bc0c96b8f43f6620",
"sha256": "9d07179adead69f957d01dcce2a48883a8115ceef47c3d23b3efdf61f37a028e"
},
"downloads": -1,
"filename": "softmax_exploration-0.1.tar.gz",
"has_sig": false,
"md5_digest": "849b3bf26780baa3bc0c96b8f43f6620",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 3851,
"upload_time": "2025-08-11T03:59:39",
"upload_time_iso_8601": "2025-08-11T03:59:39.241904Z",
"url": "https://files.pythonhosted.org/packages/70/d6/371111e81ce8eed585d44040bb5728e31579448250f08c2ff22270c7cad2/softmax_exploration-0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-11 03:59:39",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "yourusername",
"github_project": "softmax-exploration",
"github_not_found": true,
"lcname": "softmax-exploration"
}