torch-module-monitor

Name	torch-module-monitor JSON
Version	0.1.0 JSON
	download
home_page	None
Summary	Diagnostics for PyTorch model training - monitor activations, parameters, and gradients
upload_time	2025-10-21 22:28:53
maintainer	None
docs_url	None
author	None
requires_python	>=3.9
license	MIT License Copyright (c) 2025 Sebastian Bordt Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords	pytorch deep-learning diagnostics monitoring training mup coordinate-check activations gradients
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Monitor the training of your PyTorch modules

[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A simple research code base to monitor the training of small-to-medium neural networks. Log arbitrary metrics of activations, gradients, and parameters to W&B with a few lines of code!

We also implement the refined coordinate check (RCC) from the NeurIPS 2025 paper ["On the Surprising Effectiveness of Large Learning Rates under Standard Width Scaling"](https://arxiv.org/abs/2505.22491) (Haas et al., 2025).

⚡For a complete working example, see how the monitor can be integrated into [nanoGPT](https://github.com/tml-tuebingen/nanoGPT-monitored).⚡

### Installation

```bash
pip install torch-module-monitor
```

---

## Features

**1. Monitor arbitrary metrics of activations, gradients, and parameters with a few lines of code**
- Add new metrics for activations, gradients, and parameters with a single line of code.
- Regex-based filtering to determine what should be logged
- Monitor the internals of the attention operation (query/key/value tensor metrics, attention entropy)
- Aggregation of activation metrics across micro-batches

**2. Perform the Refined Coordinate Check (RCC) from https://arxiv.org/abs/2505.22491**
- We provide an implementation of the refined coordinate check.

---

## Basic Monitoring

```python
from torch_module_monitor import ModuleMonitor

# Initialize and add metrics
monitor = ModuleMonitor(monitor_step_fn=lambda step: step % 10 == 0)
monitor.set_module(model)

monitor.add_activation_metric("mean", lambda x: x.mean())
monitor.add_parameter_metric("norm", lambda x: x.norm())
monitor.add_gradient_metric("norm", lambda x: x.norm())

# Training loop
for step, (inputs, targets) in enumerate(dataloader):
    monitor.begin_step(step)

    outputs = model(inputs)  # Activations captured via hooks
    loss = criterion(outputs, targets)
    loss.backward()

    monitor.monitor_parameters()
    monitor.monitor_gradients()

    optimizer.step()
    optimizer.zero_grad()
    monitor.end_step()

    # Log metrics
    if monitor.is_step_monitored(step):
        wandb.log(monitor.get_step_metrics())
```

## Complete Examples

**See [examples/](examples/) for complete examples:**
- `metrics.ipynb` - Basic metric monitoring
- `reference-model.ipynb` - Reference module comparison
- `refined-coordinate-check.ipynb` - Refined coordinate check

⚡We also show how to integrate the monitor into [nanoGPT](https://github.com/tml-tuebingen/nanoGPT-monitored).⚡


---

## Integration with Weights & Biases

We name the different metrics such that they are nicely visualized in Weights & Biases.

Log the collected metrics in a single line of code:

```python
wandb.log(training_monitor.get_step_metrics(), step=current_step)
```

Examples: TODO Provide Links

---

## Patterns

### Regex-Based Module Filtering

You can use a regex to specify that a metric should only be computed for specific tensors.

```python
# Monitor only attention layers
monitor.add_activation_metric(
    "my_metric", my_metric(x), metric_regex=r".*mlp.*"
)
```

### Reference Module Comparison

In infinite width theory, we often want to measure the difference of activations and parameters to the model at initialization. We implement this via an arbitrary reference model to which our model can be compared.

```python
monitor.set_reference_module(reference_model)

# Track drift from initialization
monitor.add_parameter_difference_metric(
    "l2_distance", lambda p, p_ref: (p - p_ref).norm()
)
```

## Complex Modules

By default, we monitor the activations of modules that return a single tensor. To monitor statistics of complex modules, these modules can implement `MonitorMixin`. We use this approach to monitor the internals of the attention operation. 

```python
from torch_module_monitor import MonitorMixin, monitor_scaled_dot_product_attention

class MultiHeadAttention(nn.Module, MonitorMixin):
    def forward(self, x):
        q, k, v = self.compute_qkv(x)
        attn_output = F.scaled_dot_product_attention(q, k, v)

        if self.is_monitoring:
            monitor_scaled_dot_product_attention(
                self.get_module_monitor(), module=self,
                query=q, key=k, value=v, activation=attn_output
            )

        return self.output_projection(attn_output)
```

This logs per-head metrics: `activation/{module}.head_{i}.query`, `attention_entropy/{module}.head_{i}`, etc.

**Custom metrics in any module:**

```python
from torch_module_monitor import MonitorMixin

class CustomLayer(nn.Module, MonitorMixin):
    def forward(self, x):
        output = self.transform(x)

        if self.is_monitoring:
            self.get_module_monitor().log_tensor("custom_stat", output.norm(dim=-1))

        return output
```

## Multi-GPU Support

In principle, the monitor can support multi-GPU training, though we do not provide direct support for any parallelization strategy. With FSDP, for example, every GPU could have its own monitor. However, we do not currently implement the synchronization of activation metrics across GPUs. The refined coordinate check was only tested for single-GPU training. 

---

## Citation

If you use this code, please cite:

```bibtex
@inproceedings{haas2025splargelr,
  title={On the Surprising Effectiveness of Large Learning Rates under Standard Width Scaling},
  author={Haas, Moritz and Bordt, Sebastian and von Luxburg, Ulrike and Vankadara, Leena Chennuru},
  booktitle={Advances in Neural Information Processing Systems 38},
  year={2025}
}
```

## Contributing

We provide this code as-is. We may accept pull requests that fix bugs or add new features.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "torch-module-monitor",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "pytorch, deep-learning, diagnostics, monitoring, training, mup, coordinate-check, activations, gradients",
    "author": null,
    "author_email": "Sebastian Bordt <sbordt@posteo.de>",
    "download_url": "https://files.pythonhosted.org/packages/7c/9a/d06fdf798c087e52ba7ec19e252110674959b4b6ca40a43f12d16bc02591/torch_module_monitor-0.1.0.tar.gz",
    "platform": null,
    "description": "# Monitor the training of your PyTorch modules\n\n[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nA simple research code base to monitor the training of small-to-medium neural networks. Log arbitrary metrics of activations, gradients, and parameters to W&B with a few lines of code!\n\nWe also implement the refined coordinate check (RCC) from the NeurIPS 2025 paper [\"On the Surprising Effectiveness of Large Learning Rates under Standard Width Scaling\"](https://arxiv.org/abs/2505.22491) (Haas et al., 2025).\n\n\u26a1For a complete working example, see how the monitor can be integrated into [nanoGPT](https://github.com/tml-tuebingen/nanoGPT-monitored).\u26a1\n\n### Installation\n\n```bash\npip install torch-module-monitor\n```\n\n---\n\n## Features\n\n**1. Monitor arbitrary metrics of activations, gradients, and parameters with a few lines of code**\n- Add new metrics for activations, gradients, and parameters with a single line of code.\n- Regex-based filtering to determine what should be logged\n- Monitor the internals of the attention operation (query/key/value tensor metrics, attention entropy)\n- Aggregation of activation metrics across micro-batches\n\n**2. Perform the Refined Coordinate Check (RCC) from https://arxiv.org/abs/2505.22491**\n- We provide an implementation of the refined coordinate check.\n\n---\n\n## Basic Monitoring\n\n```python\nfrom torch_module_monitor import ModuleMonitor\n\n# Initialize and add metrics\nmonitor = ModuleMonitor(monitor_step_fn=lambda step: step % 10 == 0)\nmonitor.set_module(model)\n\nmonitor.add_activation_metric(\"mean\", lambda x: x.mean())\nmonitor.add_parameter_metric(\"norm\", lambda x: x.norm())\nmonitor.add_gradient_metric(\"norm\", lambda x: x.norm())\n\n# Training loop\nfor step, (inputs, targets) in enumerate(dataloader):\n    monitor.begin_step(step)\n\n    outputs = model(inputs)  # Activations captured via hooks\n    loss = criterion(outputs, targets)\n    loss.backward()\n\n    monitor.monitor_parameters()\n    monitor.monitor_gradients()\n\n    optimizer.step()\n    optimizer.zero_grad()\n    monitor.end_step()\n\n    # Log metrics\n    if monitor.is_step_monitored(step):\n        wandb.log(monitor.get_step_metrics())\n```\n\n## Complete Examples\n\n**See [examples/](examples/) for complete examples:**\n- `metrics.ipynb` - Basic metric monitoring\n- `reference-model.ipynb` - Reference module comparison\n- `refined-coordinate-check.ipynb` - Refined coordinate check\n\n\u26a1We also show how to integrate the monitor into [nanoGPT](https://github.com/tml-tuebingen/nanoGPT-monitored).\u26a1\n\n\n---\n\n## Integration with Weights & Biases\n\nWe name the different metrics such that they are nicely visualized in Weights & Biases.\n\nLog the collected metrics in a single line of code:\n\n```python\nwandb.log(training_monitor.get_step_metrics(), step=current_step)\n```\n\nExamples: TODO Provide Links\n\n---\n\n## Patterns\n\n### Regex-Based Module Filtering\n\nYou can use a regex to specify that a metric should only be computed for specific tensors.\n\n```python\n# Monitor only attention layers\nmonitor.add_activation_metric(\n    \"my_metric\", my_metric(x), metric_regex=r\".*mlp.*\"\n)\n```\n\n### Reference Module Comparison\n\nIn infinite width theory, we often want to measure the difference of activations and parameters to the model at initialization. We implement this via an arbitrary reference model to which our model can be compared.\n\n```python\nmonitor.set_reference_module(reference_model)\n\n# Track drift from initialization\nmonitor.add_parameter_difference_metric(\n    \"l2_distance\", lambda p, p_ref: (p - p_ref).norm()\n)\n```\n\n## Complex Modules\n\nBy default, we monitor the activations of modules that return a single tensor. To monitor statistics of complex modules, these modules can implement `MonitorMixin`. We use this approach to monitor the internals of the attention operation. \n\n```python\nfrom torch_module_monitor import MonitorMixin, monitor_scaled_dot_product_attention\n\nclass MultiHeadAttention(nn.Module, MonitorMixin):\n    def forward(self, x):\n        q, k, v = self.compute_qkv(x)\n        attn_output = F.scaled_dot_product_attention(q, k, v)\n\n        if self.is_monitoring:\n            monitor_scaled_dot_product_attention(\n                self.get_module_monitor(), module=self,\n                query=q, key=k, value=v, activation=attn_output\n            )\n\n        return self.output_projection(attn_output)\n```\n\nThis logs per-head metrics: `activation/{module}.head_{i}.query`, `attention_entropy/{module}.head_{i}`, etc.\n\n**Custom metrics in any module:**\n\n```python\nfrom torch_module_monitor import MonitorMixin\n\nclass CustomLayer(nn.Module, MonitorMixin):\n    def forward(self, x):\n        output = self.transform(x)\n\n        if self.is_monitoring:\n            self.get_module_monitor().log_tensor(\"custom_stat\", output.norm(dim=-1))\n\n        return output\n```\n\n## Multi-GPU Support\n\nIn principle, the monitor can support multi-GPU training, though we do not provide direct support for any parallelization strategy. With FSDP, for example, every GPU could have its own monitor. However, we do not currently implement the synchronization of activation metrics across GPUs. The refined coordinate check was only tested for single-GPU training. \n\n---\n\n## Citation\n\nIf you use this code, please cite:\n\n```bibtex\n@inproceedings{haas2025splargelr,\n  title={On the Surprising Effectiveness of Large Learning Rates under Standard Width Scaling},\n  author={Haas, Moritz and Bordt, Sebastian and von Luxburg, Ulrike and Vankadara, Leena Chennuru},\n  booktitle={Advances in Neural Information Processing Systems 38},\n  year={2025}\n}\n```\n\n## Contributing\n\nWe provide this code as-is. We may accept pull requests that fix bugs or add new features. \n\n\n",
    "bugtrack_url": null,
    "license": "MIT License\n        \n        Copyright (c) 2025 Sebastian Bordt\n        \n        Permission is hereby granted, free of charge, to any person obtaining a copy\n        of this software and associated documentation files (the \"Software\"), to deal\n        in the Software without restriction, including without limitation the rights\n        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n        copies of the Software, and to permit persons to whom the Software is\n        furnished to do so, subject to the following conditions:\n        \n        The above copyright notice and this permission notice shall be included in all\n        copies or substantial portions of the Software.\n        \n        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n        SOFTWARE.\n        ",
    "summary": "Diagnostics for PyTorch model training - monitor activations, parameters, and gradients",
    "version": "0.1.0",
    "project_urls": {
        "Documentation": "https://torch-module-monitor.readthedocs.io",
        "Homepage": "https://github.com/tml-tuebingen/torch-module-monitor",
        "Issues": "https://github.com/tml-tuebingen/torch-module-monitor/issues",
        "Source": "https://github.com/tml-tuebingen/torch-module-monitor"
    },
    "split_keywords": [
        "pytorch",
        " deep-learning",
        " diagnostics",
        " monitoring",
        " training",
        " mup",
        " coordinate-check",
        " activations",
        " gradients"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9cde7e7ac9a84aef174a128e2d1e7c71a356cbfb4c030ac4e405308fa40ffea5",
                "md5": "6370df8ad4630dd79d6239c89fb6b9ad",
                "sha256": "bf723f2ac6fc1f227431c8c76ced268a36675aa47869f151888340aec8831cd4"
            },
            "downloads": -1,
            "filename": "torch_module_monitor-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6370df8ad4630dd79d6239c89fb6b9ad",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 21036,
            "upload_time": "2025-10-21T22:28:50",
            "upload_time_iso_8601": "2025-10-21T22:28:50.225072Z",
            "url": "https://files.pythonhosted.org/packages/9c/de/7e7ac9a84aef174a128e2d1e7c71a356cbfb4c030ac4e405308fa40ffea5/torch_module_monitor-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7c9ad06fdf798c087e52ba7ec19e252110674959b4b6ca40a43f12d16bc02591",
                "md5": "53ece6032c4475c0358570cdf9f82784",
                "sha256": "cdaec9bff1d6e7f494c31f53dd77a65a01a218969ca9e6ebd56c83cbb7554cbd"
            },
            "downloads": -1,
            "filename": "torch_module_monitor-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "53ece6032c4475c0358570cdf9f82784",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 26655,
            "upload_time": "2025-10-21T22:28:53",
            "upload_time_iso_8601": "2025-10-21T22:28:53.182399Z",
            "url": "https://files.pythonhosted.org/packages/7c/9a/d06fdf798c087e52ba7ec19e252110674959b4b6ca40a43f12d16bc02591/torch_module_monitor-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-21 22:28:53",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "tml-tuebingen",
    "github_project": "torch-module-monitor",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "torch-module-monitor"
}

None