| Name | torch-module-monitor JSON |
| Version |
0.1.0
JSON |
| download |
| home_page | None |
| Summary | Diagnostics for PyTorch model training - monitor activations, parameters, and gradients |
| upload_time | 2025-10-21 22:28:53 |
| maintainer | None |
| docs_url | None |
| author | None |
| requires_python | >=3.9 |
| license | MIT License
Copyright (c) 2025 Sebastian Bordt
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
|
| keywords |
pytorch
deep-learning
diagnostics
monitoring
training
mup
coordinate-check
activations
gradients
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# Monitor the training of your PyTorch modules
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
A simple research code base to monitor the training of small-to-medium neural networks. Log arbitrary metrics of activations, gradients, and parameters to W&B with a few lines of code!
We also implement the refined coordinate check (RCC) from the NeurIPS 2025 paper ["On the Surprising Effectiveness of Large Learning Rates under Standard Width Scaling"](https://arxiv.org/abs/2505.22491) (Haas et al., 2025).
⚡For a complete working example, see how the monitor can be integrated into [nanoGPT](https://github.com/tml-tuebingen/nanoGPT-monitored).⚡
### Installation
```bash
pip install torch-module-monitor
```
---
## Features
**1. Monitor arbitrary metrics of activations, gradients, and parameters with a few lines of code**
- Add new metrics for activations, gradients, and parameters with a single line of code.
- Regex-based filtering to determine what should be logged
- Monitor the internals of the attention operation (query/key/value tensor metrics, attention entropy)
- Aggregation of activation metrics across micro-batches
**2. Perform the Refined Coordinate Check (RCC) from https://arxiv.org/abs/2505.22491**
- We provide an implementation of the refined coordinate check.
---
## Basic Monitoring
```python
from torch_module_monitor import ModuleMonitor
# Initialize and add metrics
monitor = ModuleMonitor(monitor_step_fn=lambda step: step % 10 == 0)
monitor.set_module(model)
monitor.add_activation_metric("mean", lambda x: x.mean())
monitor.add_parameter_metric("norm", lambda x: x.norm())
monitor.add_gradient_metric("norm", lambda x: x.norm())
# Training loop
for step, (inputs, targets) in enumerate(dataloader):
monitor.begin_step(step)
outputs = model(inputs) # Activations captured via hooks
loss = criterion(outputs, targets)
loss.backward()
monitor.monitor_parameters()
monitor.monitor_gradients()
optimizer.step()
optimizer.zero_grad()
monitor.end_step()
# Log metrics
if monitor.is_step_monitored(step):
wandb.log(monitor.get_step_metrics())
```
## Complete Examples
**See [examples/](examples/) for complete examples:**
- `metrics.ipynb` - Basic metric monitoring
- `reference-model.ipynb` - Reference module comparison
- `refined-coordinate-check.ipynb` - Refined coordinate check
⚡We also show how to integrate the monitor into [nanoGPT](https://github.com/tml-tuebingen/nanoGPT-monitored).⚡
---
## Integration with Weights & Biases
We name the different metrics such that they are nicely visualized in Weights & Biases.
Log the collected metrics in a single line of code:
```python
wandb.log(training_monitor.get_step_metrics(), step=current_step)
```
Examples: TODO Provide Links
---
## Patterns
### Regex-Based Module Filtering
You can use a regex to specify that a metric should only be computed for specific tensors.
```python
# Monitor only attention layers
monitor.add_activation_metric(
"my_metric", my_metric(x), metric_regex=r".*mlp.*"
)
```
### Reference Module Comparison
In infinite width theory, we often want to measure the difference of activations and parameters to the model at initialization. We implement this via an arbitrary reference model to which our model can be compared.
```python
monitor.set_reference_module(reference_model)
# Track drift from initialization
monitor.add_parameter_difference_metric(
"l2_distance", lambda p, p_ref: (p - p_ref).norm()
)
```
## Complex Modules
By default, we monitor the activations of modules that return a single tensor. To monitor statistics of complex modules, these modules can implement `MonitorMixin`. We use this approach to monitor the internals of the attention operation.
```python
from torch_module_monitor import MonitorMixin, monitor_scaled_dot_product_attention
class MultiHeadAttention(nn.Module, MonitorMixin):
def forward(self, x):
q, k, v = self.compute_qkv(x)
attn_output = F.scaled_dot_product_attention(q, k, v)
if self.is_monitoring:
monitor_scaled_dot_product_attention(
self.get_module_monitor(), module=self,
query=q, key=k, value=v, activation=attn_output
)
return self.output_projection(attn_output)
```
This logs per-head metrics: `activation/{module}.head_{i}.query`, `attention_entropy/{module}.head_{i}`, etc.
**Custom metrics in any module:**
```python
from torch_module_monitor import MonitorMixin
class CustomLayer(nn.Module, MonitorMixin):
def forward(self, x):
output = self.transform(x)
if self.is_monitoring:
self.get_module_monitor().log_tensor("custom_stat", output.norm(dim=-1))
return output
```
## Multi-GPU Support
In principle, the monitor can support multi-GPU training, though we do not provide direct support for any parallelization strategy. With FSDP, for example, every GPU could have its own monitor. However, we do not currently implement the synchronization of activation metrics across GPUs. The refined coordinate check was only tested for single-GPU training.
---
## Citation
If you use this code, please cite:
```bibtex
@inproceedings{haas2025splargelr,
title={On the Surprising Effectiveness of Large Learning Rates under Standard Width Scaling},
author={Haas, Moritz and Bordt, Sebastian and von Luxburg, Ulrike and Vankadara, Leena Chennuru},
booktitle={Advances in Neural Information Processing Systems 38},
year={2025}
}
```
## Contributing
We provide this code as-is. We may accept pull requests that fix bugs or add new features.
Raw data
{
"_id": null,
"home_page": null,
"name": "torch-module-monitor",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "pytorch, deep-learning, diagnostics, monitoring, training, mup, coordinate-check, activations, gradients",
"author": null,
"author_email": "Sebastian Bordt <sbordt@posteo.de>",
"download_url": "https://files.pythonhosted.org/packages/7c/9a/d06fdf798c087e52ba7ec19e252110674959b4b6ca40a43f12d16bc02591/torch_module_monitor-0.1.0.tar.gz",
"platform": null,
"description": "# Monitor the training of your PyTorch modules\n\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n\nA simple research code base to monitor the training of small-to-medium neural networks. Log arbitrary metrics of activations, gradients, and parameters to W&B with a few lines of code!\n\nWe also implement the refined coordinate check (RCC) from the NeurIPS 2025 paper [\"On the Surprising Effectiveness of Large Learning Rates under Standard Width Scaling\"](https://arxiv.org/abs/2505.22491) (Haas et al., 2025).\n\n\u26a1For a complete working example, see how the monitor can be integrated into [nanoGPT](https://github.com/tml-tuebingen/nanoGPT-monitored).\u26a1\n\n### Installation\n\n```bash\npip install torch-module-monitor\n```\n\n---\n\n## Features\n\n**1. Monitor arbitrary metrics of activations, gradients, and parameters with a few lines of code**\n- Add new metrics for activations, gradients, and parameters with a single line of code.\n- Regex-based filtering to determine what should be logged\n- Monitor the internals of the attention operation (query/key/value tensor metrics, attention entropy)\n- Aggregation of activation metrics across micro-batches\n\n**2. Perform the Refined Coordinate Check (RCC) from https://arxiv.org/abs/2505.22491**\n- We provide an implementation of the refined coordinate check.\n\n---\n\n## Basic Monitoring\n\n```python\nfrom torch_module_monitor import ModuleMonitor\n\n# Initialize and add metrics\nmonitor = ModuleMonitor(monitor_step_fn=lambda step: step % 10 == 0)\nmonitor.set_module(model)\n\nmonitor.add_activation_metric(\"mean\", lambda x: x.mean())\nmonitor.add_parameter_metric(\"norm\", lambda x: x.norm())\nmonitor.add_gradient_metric(\"norm\", lambda x: x.norm())\n\n# Training loop\nfor step, (inputs, targets) in enumerate(dataloader):\n monitor.begin_step(step)\n\n outputs = model(inputs) # Activations captured via hooks\n loss = criterion(outputs, targets)\n loss.backward()\n\n monitor.monitor_parameters()\n monitor.monitor_gradients()\n\n optimizer.step()\n optimizer.zero_grad()\n monitor.end_step()\n\n # Log metrics\n if monitor.is_step_monitored(step):\n wandb.log(monitor.get_step_metrics())\n```\n\n## Complete Examples\n\n**See [examples/](examples/) for complete examples:**\n- `metrics.ipynb` - Basic metric monitoring\n- `reference-model.ipynb` - Reference module comparison\n- `refined-coordinate-check.ipynb` - Refined coordinate check\n\n\u26a1We also show how to integrate the monitor into [nanoGPT](https://github.com/tml-tuebingen/nanoGPT-monitored).\u26a1\n\n\n---\n\n## Integration with Weights & Biases\n\nWe name the different metrics such that they are nicely visualized in Weights & Biases.\n\nLog the collected metrics in a single line of code:\n\n```python\nwandb.log(training_monitor.get_step_metrics(), step=current_step)\n```\n\nExamples: TODO Provide Links\n\n---\n\n## Patterns\n\n### Regex-Based Module Filtering\n\nYou can use a regex to specify that a metric should only be computed for specific tensors.\n\n```python\n# Monitor only attention layers\nmonitor.add_activation_metric(\n \"my_metric\", my_metric(x), metric_regex=r\".*mlp.*\"\n)\n```\n\n### Reference Module Comparison\n\nIn infinite width theory, we often want to measure the difference of activations and parameters to the model at initialization. We implement this via an arbitrary reference model to which our model can be compared.\n\n```python\nmonitor.set_reference_module(reference_model)\n\n# Track drift from initialization\nmonitor.add_parameter_difference_metric(\n \"l2_distance\", lambda p, p_ref: (p - p_ref).norm()\n)\n```\n\n## Complex Modules\n\nBy default, we monitor the activations of modules that return a single tensor. To monitor statistics of complex modules, these modules can implement `MonitorMixin`. We use this approach to monitor the internals of the attention operation. \n\n```python\nfrom torch_module_monitor import MonitorMixin, monitor_scaled_dot_product_attention\n\nclass MultiHeadAttention(nn.Module, MonitorMixin):\n def forward(self, x):\n q, k, v = self.compute_qkv(x)\n attn_output = F.scaled_dot_product_attention(q, k, v)\n\n if self.is_monitoring:\n monitor_scaled_dot_product_attention(\n self.get_module_monitor(), module=self,\n query=q, key=k, value=v, activation=attn_output\n )\n\n return self.output_projection(attn_output)\n```\n\nThis logs per-head metrics: `activation/{module}.head_{i}.query`, `attention_entropy/{module}.head_{i}`, etc.\n\n**Custom metrics in any module:**\n\n```python\nfrom torch_module_monitor import MonitorMixin\n\nclass CustomLayer(nn.Module, MonitorMixin):\n def forward(self, x):\n output = self.transform(x)\n\n if self.is_monitoring:\n self.get_module_monitor().log_tensor(\"custom_stat\", output.norm(dim=-1))\n\n return output\n```\n\n## Multi-GPU Support\n\nIn principle, the monitor can support multi-GPU training, though we do not provide direct support for any parallelization strategy. With FSDP, for example, every GPU could have its own monitor. However, we do not currently implement the synchronization of activation metrics across GPUs. The refined coordinate check was only tested for single-GPU training. \n\n---\n\n## Citation\n\nIf you use this code, please cite:\n\n```bibtex\n@inproceedings{haas2025splargelr,\n title={On the Surprising Effectiveness of Large Learning Rates under Standard Width Scaling},\n author={Haas, Moritz and Bordt, Sebastian and von Luxburg, Ulrike and Vankadara, Leena Chennuru},\n booktitle={Advances in Neural Information Processing Systems 38},\n year={2025}\n}\n```\n\n## Contributing\n\nWe provide this code as-is. We may accept pull requests that fix bugs or add new features. \n\n\n",
"bugtrack_url": null,
"license": "MIT License\n \n Copyright (c) 2025 Sebastian Bordt\n \n Permission is hereby granted, free of charge, to any person obtaining a copy\n of this software and associated documentation files (the \"Software\"), to deal\n in the Software without restriction, including without limitation the rights\n to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n copies of the Software, and to permit persons to whom the Software is\n furnished to do so, subject to the following conditions:\n \n The above copyright notice and this permission notice shall be included in all\n copies or substantial portions of the Software.\n \n THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n SOFTWARE.\n ",
"summary": "Diagnostics for PyTorch model training - monitor activations, parameters, and gradients",
"version": "0.1.0",
"project_urls": {
"Documentation": "https://torch-module-monitor.readthedocs.io",
"Homepage": "https://github.com/tml-tuebingen/torch-module-monitor",
"Issues": "https://github.com/tml-tuebingen/torch-module-monitor/issues",
"Source": "https://github.com/tml-tuebingen/torch-module-monitor"
},
"split_keywords": [
"pytorch",
" deep-learning",
" diagnostics",
" monitoring",
" training",
" mup",
" coordinate-check",
" activations",
" gradients"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "9cde7e7ac9a84aef174a128e2d1e7c71a356cbfb4c030ac4e405308fa40ffea5",
"md5": "6370df8ad4630dd79d6239c89fb6b9ad",
"sha256": "bf723f2ac6fc1f227431c8c76ced268a36675aa47869f151888340aec8831cd4"
},
"downloads": -1,
"filename": "torch_module_monitor-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6370df8ad4630dd79d6239c89fb6b9ad",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 21036,
"upload_time": "2025-10-21T22:28:50",
"upload_time_iso_8601": "2025-10-21T22:28:50.225072Z",
"url": "https://files.pythonhosted.org/packages/9c/de/7e7ac9a84aef174a128e2d1e7c71a356cbfb4c030ac4e405308fa40ffea5/torch_module_monitor-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "7c9ad06fdf798c087e52ba7ec19e252110674959b4b6ca40a43f12d16bc02591",
"md5": "53ece6032c4475c0358570cdf9f82784",
"sha256": "cdaec9bff1d6e7f494c31f53dd77a65a01a218969ca9e6ebd56c83cbb7554cbd"
},
"downloads": -1,
"filename": "torch_module_monitor-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "53ece6032c4475c0358570cdf9f82784",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 26655,
"upload_time": "2025-10-21T22:28:53",
"upload_time_iso_8601": "2025-10-21T22:28:53.182399Z",
"url": "https://files.pythonhosted.org/packages/7c/9a/d06fdf798c087e52ba7ec19e252110674959b4b6ca40a43f12d16bc02591/torch_module_monitor-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-21 22:28:53",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "tml-tuebingen",
"github_project": "torch-module-monitor",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "torch-module-monitor"
}