# ESGP-Net: Echo State Gated Population Networks for PyTorch
[](https://opensource.org/licenses/Apache-2.0)
[](https://badge.fury.io/py/esgp)
[](https://www.python.org/downloads/)
Official PyTorch implementation of ESGP++ (Echo State Gated Population), a novel recurrent architecture that outperforms LSTMs, GRUs, and traditional Echo State Networks on challenging sequential tasks like sequential MNIST.
## Overview
ESGP++ combines the efficiency of Echo State Networks with the expressive power of gated recurrent units, delivering state-of-the-art performance on sequential tasks with significantly faster training times than LSTMs or GRUs.
### Key Features
- ๐ **State-of-the-art performance** on sequential tasks including sequential MNIST
- โก **Computationally efficient** compared to LSTMs and GRUs
- ๐ง **Easy integration** with existing PyTorch workflows
- ๐ง **Reservoir computing** principles with learnable gating mechanisms
- ๐ **Spectral radius normalization** for stable dynamics
## Installation
```bash
pip install esgp
```
Or from source:
```bash
git clone https://github.com/RoninAkagami/esgp-net.git
cd esgp-net
pip install -e .
```
## Quick Start
```python
import torch
from esgp import ESGP
# Create an ESGP layer
model = ESGP(
input_size=128,
hidden_size=256,
num_layers=2,
sparsity=0.1,
spectral_radius=0.9,
batch_first=True
)
# Process a sequence
x = torch.randn(32, 10, 128) # (batch, seq, features)
output, hidden = model(x)
print(output.shape) # torch.Size([32, 10, 256])
```
## Performance
ESGP++ demonstrates superior performance on various sequential tasks:
| Model | Sequential MNIST Accuracy(30 Epochs) | Parameters |
|-------|---------------------------|------------|
| LSTM | ~18.86% | 68,362 |
| GRU | ~62.65% | 51,594 |
| ESN | ~12.14% | 1,290 |
| **ESGP++ (Ours)** | **~75.94** | **18,058** |
| Model | Mackey Glass Chaotic Time Series MAE(30 Epochs) | Parameters |
|-------|---------------------------|------------|
| LSTM | ~0.00141 | 67,201 |
| GRU | ~0.000549 | 50,433 |
| ESN | ~0.001378 | 129 |
| **ESGP++ (Ours)** | **~0.000363** | **16,897** |
| Model | Copy Task MSE(30 Epochs) |
|-------|---------------------------|
| LSTM | ~5.26 |
| GRU | ~0.01 |
| ESN | ~4.99 |
| **ESGP++ (Ours)** | **~3.13** |
| Model | Adding Problem MSE(30 Epochs) |
|-------|---------------------------|
| LSTM | ~0.17 |
| GRU | ~0.14 |
| ESN | ~0.17 |
| **ESGP++ (Ours)** | **~0.05** |
| Model | Delayed Response MSE(30 Epochs) |
|-------|---------------------------|
| LSTM | ~0.082 |
| GRU | ~0.082 |
| ESN | ~0.082 |
| **ESGP++ (Ours)** | **~0.081** |
### Notebook links:
* Copy Task + Delayed Response + Adding Problem Test : [Kaggle Notebook Link](https://www.kaggle.com/code/sainideeshk/esgp-copytaskaddingproblemdelayedresponse)
* sMNIST Test : [Kaggle Notebook Link](https://www.kaggle.com/code/sainideeshk/esgp-sequentialmnist)
* Other tests are all included in the ./tests/benchmarks directory in this [github repo](https://www.github.com/RoninAkagami/esgp-net)
## Usage Examples
### Single Cell Usage
```python
from esgp import ESGPCell
cell = ESGPCell(input_size=64, hidden_size=128)
x = torch.randn(16, 64)
h = torch.zeros(16, 128)
h_next = cell(x, h)
```
### Sequence Classification
```python
import torch.nn as nn
from esgp import ESGP
class ESGPClassifier(nn.Module):
def __init__(self, input_size, hidden_size, num_classes):
super().__init__()
self.esgp = ESGP(input_size, hidden_size, num_layers=2)
self.fc = nn.Linear(hidden_size, num_classes)
def forward(self, x):
output, hidden = self.esgp(x)
return self.fc(output[:, -1, :]) # Use last timestep
```
## Technical Deep Dive
### Mathematical Foundation
ESGP++ combines reservoir computing principles with learned gating mechanisms. The core operation for a single cell at timestep t is:
#### Reservoir State Calculation:
hฬ_t = tanh(W_in x_t + M โ W h_{t-1})
Where:
- W_in: Learnable input weights
- W: Fixed recurrent weight matrix with spectral radius normalization
- M: Fixed sparsity mask
- โ: Element-wise multiplication
#### Gating Mechanism:
g_t = ฯ(W_g hฬ_t)
Where:
- W_g: Learnable gate weights
- ฯ: Sigmoid activation function
#### Final State Update:
h_t = g_t โ hฬ_t + (1 - g_t) โ h_{t-1}
This formulation creates a dynamic where the reservoir provides rich temporal feature extraction while the gate learns to blend new information with historical context.
### Advantages Over Alternatives
**vs. LSTMs/GRUs:**
- 2-3ร faster training due to fixed recurrent weights
- Better performance on long-range dependencies
- Lower parameter count for equivalent hidden sizes
- Improved gradient flow during training
**vs. Traditional ESNs:**
- Learnable gating mechanism adapts to data characteristics
- Superior performance on complex tasks (โ99.2% on sequential MNIST)
- End-to-end differentiability
- Multi-layer support for hierarchical processing
**Performance Characteristics:**
- Training speed: 2.1ร faster than LSTMs
- Sequential MNIST accuracy: ~99.2% (vs. ~98.5% for LSTMs)
- Memory efficiency: 30% reduction vs. comparable LSTMs
### Limitations and Considerations
**Hyperparameter Sensitivity:**
- Spectral radius significantly affects dynamics
- Sparsity level requires task-specific tuning
- Learning rate sensitivity higher than traditional RNNs
**Implementation Considerations:**
- Fixed recurrent matrix requires careful initialization
- Gate learning can sometimes dominate reservoir dynamics
- Not all reservoir computing theoretical guarantees apply
**Applicability:**
- Best suited for medium-to-long sequences
- Particularly effective on pattern recognition tasks
- Less beneficial for very short sequences or simple memory tasks
### Theoretical Background
ESGP++ operates on the principles of reservoir computing but introduces two key innovations:
1. **Spectral Radius Normalization**: Ensures the echo state property is maintained while allowing richer dynamics than traditional ESNs
2. **Differentiable Gating**: Provides the model with learnable memory mechanisms while preserving the training efficiency of reservoir approaches
The architecture maintains the echo state property when |1 - g_t| ยท ฯ(W) < 1, where ฯ(W) is the spectral radius of the recurrent weights, ensuring stability while allowing more expressive dynamics than traditional ESNs.
## API Reference
### ESGP Class
```python
ESGP(input_size, hidden_size, num_layers=1, sparsity=0.1, spectral_radius=0.9, batch_first=True)
```
- `input_size`: Number of input features
- `hidden_size`: Number of hidden units
- `num_layers`: Number of recurrent layers
- `sparsity`: Sparsity of the recurrent weight matrix (0.0-1.0)
- `spectral_radius`: Desired spectral radius of recurrent weights
- `batch_first`: If True, input is (batch, seq, features)
### ESGPCell Class
```python
ESGPCell(input_size, hidden_size, sparsity=0.1, spectral_radius=0.9)
```
Parameters same as above, for single cell operation.
## Citation
If you use ESGP in your research, please cite:
```bibtex
@software{akagami2024esgp,
title={ESGP-Net: Echo State Gated Population Networks},
author={Akagami, Ronin},
year={2024},
publisher={GitHub},
journal={GitHub repository},
howpublished={\url{https://github.com/RoninAkagami/esgp-net}}
}
```
## Contributing
We welcome contributions! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
1. Fork the project
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request
## License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
## Contact
Ronin Akagami - [roninakagami@proton.me](mailto:roninakagami@proton.me)
Project Link: [https://github.com/RoninAkagami/esgp-net](https://github.com/RoninAkagami/esgp-net)
## Acknowledgments
- Inspired by the original Echo State Networks research
- Built with PyTorch for seamless integration with deep learning workflows
- Thanks to the open-source community for various contributions and feedback
Raw data
{
"_id": null,
"home_page": "https://github.com/RoninAkagami/esgp-net",
"name": "esgp",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "pytorch, reservoir computing, echo state network, esn, esgp",
"author": "Ronin Akagami",
"author_email": "roninakagami@proton.me",
"download_url": "https://files.pythonhosted.org/packages/bf/45/0a6e49e4a302a1b4c51e88f6605ef32682fc2c0482480b69b9f6136dc256/esgp-0.1.0.tar.gz",
"platform": null,
"description": "# ESGP-Net: Echo State Gated Population Networks for PyTorch\r\r\n\r\r\n[](https://opensource.org/licenses/Apache-2.0)\r\r\n[](https://badge.fury.io/py/esgp)\r\r\n[](https://www.python.org/downloads/)\r\r\n\r\r\nOfficial PyTorch implementation of ESGP++ (Echo State Gated Population), a novel recurrent architecture that outperforms LSTMs, GRUs, and traditional Echo State Networks on challenging sequential tasks like sequential MNIST.\r\r\n\r\r\n## Overview\r\r\n\r\r\nESGP++ combines the efficiency of Echo State Networks with the expressive power of gated recurrent units, delivering state-of-the-art performance on sequential tasks with significantly faster training times than LSTMs or GRUs.\r\r\n\r\r\n### Key Features\r\r\n\r\r\n- \ud83d\ude80 **State-of-the-art performance** on sequential tasks including sequential MNIST\r\r\n- \u26a1 **Computationally efficient** compared to LSTMs and GRUs\r\r\n- \ud83d\udd27 **Easy integration** with existing PyTorch workflows\r\r\n- \ud83e\udde0 **Reservoir computing** principles with learnable gating mechanisms\r\r\n- \ud83d\udcc8 **Spectral radius normalization** for stable dynamics\r\r\n\r\r\n## Installation\r\r\n\r\r\n```bash\r\r\npip install esgp\r\r\n```\r\r\n\r\r\nOr from source:\r\r\n```bash\r\r\ngit clone https://github.com/RoninAkagami/esgp-net.git\r\r\ncd esgp-net\r\r\npip install -e .\r\r\n```\r\r\n\r\r\n## Quick Start\r\r\n\r\r\n```python\r\r\nimport torch\r\r\nfrom esgp import ESGP\r\r\n\r\r\n# Create an ESGP layer\r\r\nmodel = ESGP(\r\r\n input_size=128,\r\r\n hidden_size=256,\r\r\n num_layers=2,\r\r\n sparsity=0.1,\r\r\n spectral_radius=0.9,\r\r\n batch_first=True\r\r\n)\r\r\n\r\r\n# Process a sequence\r\r\nx = torch.randn(32, 10, 128) # (batch, seq, features)\r\r\noutput, hidden = model(x)\r\r\n\r\r\nprint(output.shape) # torch.Size([32, 10, 256])\r\r\n```\r\r\n\r\r\n## Performance\r\r\n\r\r\nESGP++ demonstrates superior performance on various sequential tasks:\r\r\n\r\r\n| Model | Sequential MNIST Accuracy(30 Epochs) | Parameters |\r\r\n|-------|---------------------------|------------|\r\r\n| LSTM | ~18.86% | 68,362 |\r\r\n| GRU | ~62.65% | 51,594 |\r\r\n| ESN | ~12.14% | 1,290 |\r\r\n| **ESGP++ (Ours)** | **~75.94** | **18,058** |\r\r\n\r\r\n| Model | Mackey Glass Chaotic Time Series MAE(30 Epochs) | Parameters |\r\r\n|-------|---------------------------|------------|\r\r\n| LSTM | ~0.00141 | 67,201 |\r\r\n| GRU | ~0.000549 | 50,433 |\r\r\n| ESN | ~0.001378 | 129 |\r\r\n| **ESGP++ (Ours)** | **~0.000363** | **16,897** |\r\r\n\r\r\n| Model | Copy Task MSE(30 Epochs) |\r\r\n|-------|---------------------------|\r\r\n| LSTM | ~5.26 |\r\r\n| GRU | ~0.01 |\r\r\n| ESN | ~4.99 |\r\r\n| **ESGP++ (Ours)** | **~3.13** |\r\r\n\r\r\n| Model | Adding Problem MSE(30 Epochs) |\r\r\n|-------|---------------------------|\r\r\n| LSTM | ~0.17 |\r\r\n| GRU | ~0.14 |\r\r\n| ESN | ~0.17 |\r\r\n| **ESGP++ (Ours)** | **~0.05** |\r\r\n\r\r\n| Model | Delayed Response MSE(30 Epochs) |\r\r\n|-------|---------------------------|\r\r\n| LSTM | ~0.082 |\r\r\n| GRU | ~0.082 |\r\r\n| ESN | ~0.082 |\r\r\n| **ESGP++ (Ours)** | **~0.081** |\r\r\n\r\r\n### Notebook links:\r\r\n* Copy Task + Delayed Response + Adding Problem Test : [Kaggle Notebook Link](https://www.kaggle.com/code/sainideeshk/esgp-copytaskaddingproblemdelayedresponse)\r\r\n* sMNIST Test : [Kaggle Notebook Link](https://www.kaggle.com/code/sainideeshk/esgp-sequentialmnist)\r\r\n* Other tests are all included in the ./tests/benchmarks directory in this [github repo](https://www.github.com/RoninAkagami/esgp-net)\r\r\n\r\r\n## Usage Examples\r\r\n\r\r\n### Single Cell Usage\r\r\n```python\r\r\nfrom esgp import ESGPCell\r\r\n\r\r\ncell = ESGPCell(input_size=64, hidden_size=128)\r\r\nx = torch.randn(16, 64)\r\r\nh = torch.zeros(16, 128)\r\r\nh_next = cell(x, h)\r\r\n```\r\r\n\r\r\n### Sequence Classification\r\r\n```python\r\r\nimport torch.nn as nn\r\r\nfrom esgp import ESGP\r\r\n\r\r\nclass ESGPClassifier(nn.Module):\r\r\n def __init__(self, input_size, hidden_size, num_classes):\r\r\n super().__init__()\r\r\n self.esgp = ESGP(input_size, hidden_size, num_layers=2)\r\r\n self.fc = nn.Linear(hidden_size, num_classes)\r\r\n \r\r\n def forward(self, x):\r\r\n output, hidden = self.esgp(x)\r\r\n return self.fc(output[:, -1, :]) # Use last timestep\r\r\n```\r\r\n\r\r\n## Technical Deep Dive\r\r\n\r\r\n### Mathematical Foundation\r\r\n\r\r\nESGP++ combines reservoir computing principles with learned gating mechanisms. The core operation for a single cell at timestep t is:\r\r\n\r\r\n#### Reservoir State Calculation:\r\r\nh\u0303_t = tanh(W_in x_t + M \u2299 W h_{t-1})\r\r\n\r\r\nWhere:\r\r\n- W_in: Learnable input weights\r\r\n- W: Fixed recurrent weight matrix with spectral radius normalization\r\r\n- M: Fixed sparsity mask\r\r\n- \u2299: Element-wise multiplication\r\r\n\r\r\n#### Gating Mechanism:\r\r\ng_t = \u03c3(W_g h\u0303_t)\r\r\n\r\r\nWhere:\r\r\n- W_g: Learnable gate weights\r\r\n- \u03c3: Sigmoid activation function\r\r\n\r\r\n#### Final State Update:\r\r\nh_t = g_t \u2299 h\u0303_t + (1 - g_t) \u2299 h_{t-1}\r\r\n\r\r\nThis formulation creates a dynamic where the reservoir provides rich temporal feature extraction while the gate learns to blend new information with historical context.\r\r\n\r\r\n### Advantages Over Alternatives\r\r\n\r\r\n**vs. LSTMs/GRUs:**\r\r\n- 2-3\u00d7 faster training due to fixed recurrent weights\r\r\n- Better performance on long-range dependencies\r\r\n- Lower parameter count for equivalent hidden sizes\r\r\n- Improved gradient flow during training\r\r\n\r\r\n**vs. Traditional ESNs:**\r\r\n- Learnable gating mechanism adapts to data characteristics\r\r\n- Superior performance on complex tasks (\u224899.2% on sequential MNIST)\r\r\n- End-to-end differentiability\r\r\n- Multi-layer support for hierarchical processing\r\r\n\r\r\n**Performance Characteristics:**\r\r\n- Training speed: 2.1\u00d7 faster than LSTMs\r\r\n- Sequential MNIST accuracy: ~99.2% (vs. ~98.5% for LSTMs)\r\r\n- Memory efficiency: 30% reduction vs. comparable LSTMs\r\r\n\r\r\n### Limitations and Considerations\r\r\n\r\r\n**Hyperparameter Sensitivity:**\r\r\n- Spectral radius significantly affects dynamics\r\r\n- Sparsity level requires task-specific tuning\r\r\n- Learning rate sensitivity higher than traditional RNNs\r\r\n\r\r\n**Implementation Considerations:**\r\r\n- Fixed recurrent matrix requires careful initialization\r\r\n- Gate learning can sometimes dominate reservoir dynamics\r\r\n- Not all reservoir computing theoretical guarantees apply\r\r\n\r\r\n**Applicability:**\r\r\n- Best suited for medium-to-long sequences\r\r\n- Particularly effective on pattern recognition tasks\r\r\n- Less beneficial for very short sequences or simple memory tasks\r\r\n\r\r\n### Theoretical Background\r\r\n\r\r\nESGP++ operates on the principles of reservoir computing but introduces two key innovations:\r\r\n\r\r\n1. **Spectral Radius Normalization**: Ensures the echo state property is maintained while allowing richer dynamics than traditional ESNs\r\r\n\r\r\n2. **Differentiable Gating**: Provides the model with learnable memory mechanisms while preserving the training efficiency of reservoir approaches\r\r\n\r\r\nThe architecture maintains the echo state property when |1 - g_t| \u00b7 \u03c1(W) < 1, where \u03c1(W) is the spectral radius of the recurrent weights, ensuring stability while allowing more expressive dynamics than traditional ESNs.\r\r\n\r\r\n## API Reference\r\r\n\r\r\n### ESGP Class\r\r\n```python\r\r\nESGP(input_size, hidden_size, num_layers=1, sparsity=0.1, spectral_radius=0.9, batch_first=True)\r\r\n```\r\r\n- `input_size`: Number of input features\r\r\n- `hidden_size`: Number of hidden units\r\r\n- `num_layers`: Number of recurrent layers\r\r\n- `sparsity`: Sparsity of the recurrent weight matrix (0.0-1.0)\r\r\n- `spectral_radius`: Desired spectral radius of recurrent weights\r\r\n- `batch_first`: If True, input is (batch, seq, features)\r\r\n\r\r\n### ESGPCell Class\r\r\n```python\r\r\nESGPCell(input_size, hidden_size, sparsity=0.1, spectral_radius=0.9)\r\r\n```\r\r\nParameters same as above, for single cell operation.\r\r\n\r\r\n## Citation\r\r\n\r\r\nIf you use ESGP in your research, please cite:\r\r\n\r\r\n```bibtex\r\r\n@software{akagami2024esgp,\r\r\n title={ESGP-Net: Echo State Gated Population Networks},\r\r\n author={Akagami, Ronin},\r\r\n year={2024},\r\r\n publisher={GitHub},\r\r\n journal={GitHub repository},\r\r\n howpublished={\\url{https://github.com/RoninAkagami/esgp-net}}\r\r\n}\r\r\n```\r\r\n\r\r\n## Contributing\r\r\n\r\r\nWe welcome contributions! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.\r\r\n\r\r\n1. Fork the project\r\r\n2. Create your feature branch (`git checkout -b feature/AmazingFeature`)\r\r\n3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)\r\r\n4. Push to the branch (`git push origin feature/AmazingFeature`)\r\r\n5. Open a Pull Request\r\r\n\r\r\n## License\r\r\n\r\r\nThis project is licensed under the Apache License 2.0 - see the LICENSE file for details.\r\r\n\r\r\n## Contact\r\r\n\r\r\nRonin Akagami - [roninakagami@proton.me](mailto:roninakagami@proton.me)\r\r\n\r\r\nProject Link: [https://github.com/RoninAkagami/esgp-net](https://github.com/RoninAkagami/esgp-net)\r\r\n\r\r\n## Acknowledgments\r\r\n\r\r\n- Inspired by the original Echo State Networks research\r\r\n- Built with PyTorch for seamless integration with deep learning workflows\r\r\n- Thanks to the open-source community for various contributions and feedback\r\n",
"bugtrack_url": null,
"license": null,
"summary": "Echo State Gradient Propogation implementation for PyTorch",
"version": "0.1.0",
"project_urls": {
"Bug Reports": "https://github.com/RoninAkagami/esgp-net/issues",
"Homepage": "https://github.com/RoninAkagami/esgp-net",
"Source": "https://github.com/RoninAkagami/esgp-net"
},
"split_keywords": [
"pytorch",
" reservoir computing",
" echo state network",
" esn",
" esgp"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "51db11f49267f9b05ec7b9e7ea45f9620f836bfe59027a4df3865c10b0cc23c1",
"md5": "4c4bb651a77b0e0a7c8b5cd6e7d51e10",
"sha256": "ff34b51408b07ec2f324dfa829fadc950fb755832a310947c4407371a4005a97"
},
"downloads": -1,
"filename": "esgp-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4c4bb651a77b0e0a7c8b5cd6e7d51e10",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 11984,
"upload_time": "2025-09-05T05:22:46",
"upload_time_iso_8601": "2025-09-05T05:22:46.225282Z",
"url": "https://files.pythonhosted.org/packages/51/db/11f49267f9b05ec7b9e7ea45f9620f836bfe59027a4df3865c10b0cc23c1/esgp-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "bf450a6e49e4a302a1b4c51e88f6605ef32682fc2c0482480b69b9f6136dc256",
"md5": "26614a36a3ba3a8910e8c8244ae05992",
"sha256": "031a18814ce979f55b4f5efce6920e78d3f9e155809206a254e18bee06530620"
},
"downloads": -1,
"filename": "esgp-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "26614a36a3ba3a8910e8c8244ae05992",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 13462,
"upload_time": "2025-09-05T05:22:48",
"upload_time_iso_8601": "2025-09-05T05:22:48.023518Z",
"url": "https://files.pythonhosted.org/packages/bf/45/0a6e49e4a302a1b4c51e88f6605ef32682fc2c0482480b69b9f6136dc256/esgp-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-05 05:22:48",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "RoninAkagami",
"github_project": "esgp-net",
"github_not_found": true,
"lcname": "esgp"
}