pytorch-posthoc-ema

Name	pytorch-posthoc-ema JSON
Version	1.0.8 JSON
	download
home_page	None
Summary	Post-hoc EMA synthesis for PyTorch
upload_time	2025-02-16 10:51:08
maintainer	None
docs_url	None
author	Phil Wang
requires_python	<4.0,>=3.9
license	MIT
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # pytorch-posthoc-ema

Choose your EMA decay rate after training. No need to decide upfront.

The library uses `sigma_rel` (relative standard deviation) to parameterize EMA decay rates, which relates to the classical EMA decay rate `beta` as follows:

```python
beta = 0.9999  # Very slow decay -> sigma_rel ≈ 0.01
beta = 0.9990  # Slow decay     -> sigma_rel ≈ 0.03
beta = 0.9900  # Medium decay   -> sigma_rel ≈ 0.10
beta = 0.9000  # Fast decay     -> sigma_rel ≈ 0.27
```

This library was adapted from [ema-pytorch](https://github.com/lucidrains/ema-pytorch) by lucidrains.

New features and changes:

- No extra VRAM usage by keeping EMA on cpu
- No extra VRAM usage for EMA synthesis during evaluation
- Low RAM usage for EMA synthesis
- Simplified or more explicit usage
- Opinionated defaults
- Select number of checkpoints to keep
- Allow "Switch EMA" with PostHocEMA
- Visualization of EMA reconstruction error before training

## Install

```bash
pip install pytorch-posthoc-ema
```

or

```bash
poetry add pytorch-posthoc-ema
```

## Basic Usage

```python
import torch
from posthoc_ema import PostHocEMA

model = torch.nn.Linear(512, 512)

posthoc_ema = PostHocEMA.from_model(model, "posthoc-ema")

for _ in range(1000):
    # mutate your network, normally with an optimizer
    with torch.no_grad():
        model.weight.copy_(torch.randn_like(model.weight))
        model.bias.copy_(torch.randn_like(model.bias))

    posthoc_ema.update_(model)

data = torch.randn(1, 512)
predictions = model(data)

# use the helper
with posthoc_ema.model(model, sigma_rel=0.15) as ema_model:
    ema_predictions = ema_model(data)
```

### Load After Training

```python
# With model
posthoc_ema = PostHocEMA.from_path("posthoc-ema", model)
with posthoc_ema.model(model, sigma_rel=0.15) as ema_model:
    ema_predictions = ema_model(data)

# Without model
posthoc_ema = PostHocEMA.from_path("posthoc-ema")
with posthoc_ema.state_dict(sigma_rel=0.15) as state_dict:
    model.load_state_dict(state_dict, strict=False)
```

## Advanced Usage

### Switch EMA During Training

```python
with posthoc_ema.state_dict(sigma_rel=0.15) as state_dict:
    model.load_state_dict(state_dict, strict=False)
```

### Visualize Reconstruction Quality

```python
posthoc_ema.reconstruction_error()
```

### Configuration

```python
posthoc_ema = PostHocEMA.from_model(
    model,
    checkpoint_dir="path/to/checkpoints",
    max_checkpoints=20,  # Keep last 20 checkpoints per EMA model
    sigma_rels=(0.05, 0.28),  # Default relative standard deviations from paper
    update_every=10,  # Update EMA weights every 10 steps
    checkpoint_every=1000,  # Create checkpoints every 1000 steps
    checkpoint_dtype=torch.float16,  # Store checkpoints in half precision
)
```

## Citations

```bibtex
@article{Karras2023AnalyzingAI,
    title   = {Analyzing and Improving the Training Dynamics of Diffusion Models},
    author  = {Tero Karras and Miika Aittala and Jaakko Lehtinen and Janne Hellsten and Timo Aila and Samuli Laine},
    journal = {ArXiv},
    year    = {2023},
    volume  = {abs/2312.02696}
}
```

```bibtex
@article{Lee2024SlowAS,
    title   = {Slow and Steady Wins the Race: Maintaining Plasticity with Hare and Tortoise Networks},
    author  = {Hojoon Lee and Hyeonseo Cho and Hyunseung Kim and Donghu Kim and Dugki Min and Jaegul Choo and Clare Lyle},
    journal = {ArXiv},
    year    = {2024},
    volume  = {abs/2406.02596}
}
```

```bibtex
@article{Li2024SwitchEA,
    title   = {Switch EMA: A Free Lunch for Better Flatness and Sharpness},
    author  = {Siyuan Li and Zicheng Liu and Juanxi Tian and Ge Wang and Zedong Wang and Weiyang Jin and Di Wu and Cheng Tan and Tao Lin and Yang Liu and Baigui Sun and Stan Z. Li},
    journal = {ArXiv},
    year    = {2024},
    volume  = {abs/2402.09240}
}
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pytorch-posthoc-ema",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "Phil Wang",
    "author_email": "lucidrains@gmail.com>, Richard L\u00f6wenstr\u00f6m <samedii@gmail.com",
    "download_url": null,
    "platform": null,
    "description": "# pytorch-posthoc-ema\n\nChoose your EMA decay rate after training. No need to decide upfront.\n\nThe library uses `sigma_rel` (relative standard deviation) to parameterize EMA decay rates, which relates to the classical EMA decay rate `beta` as follows:\n\n```python\nbeta = 0.9999  # Very slow decay -> sigma_rel \u2248 0.01\nbeta = 0.9990  # Slow decay     -> sigma_rel \u2248 0.03\nbeta = 0.9900  # Medium decay   -> sigma_rel \u2248 0.10\nbeta = 0.9000  # Fast decay     -> sigma_rel \u2248 0.27\n```\n\nThis library was adapted from [ema-pytorch](https://github.com/lucidrains/ema-pytorch) by lucidrains.\n\nNew features and changes:\n\n- No extra VRAM usage by keeping EMA on cpu\n- No extra VRAM usage for EMA synthesis during evaluation\n- Low RAM usage for EMA synthesis\n- Simplified or more explicit usage\n- Opinionated defaults\n- Select number of checkpoints to keep\n- Allow \"Switch EMA\" with PostHocEMA\n- Visualization of EMA reconstruction error before training\n\n## Install\n\n```bash\npip install pytorch-posthoc-ema\n```\n\nor\n\n```bash\npoetry add pytorch-posthoc-ema\n```\n\n## Basic Usage\n\n```python\nimport torch\nfrom posthoc_ema import PostHocEMA\n\nmodel = torch.nn.Linear(512, 512)\n\nposthoc_ema = PostHocEMA.from_model(model, \"posthoc-ema\")\n\nfor _ in range(1000):\n    # mutate your network, normally with an optimizer\n    with torch.no_grad():\n        model.weight.copy_(torch.randn_like(model.weight))\n        model.bias.copy_(torch.randn_like(model.bias))\n\n    posthoc_ema.update_(model)\n\ndata = torch.randn(1, 512)\npredictions = model(data)\n\n# use the helper\nwith posthoc_ema.model(model, sigma_rel=0.15) as ema_model:\n    ema_predictions = ema_model(data)\n```\n\n### Load After Training\n\n```python\n# With model\nposthoc_ema = PostHocEMA.from_path(\"posthoc-ema\", model)\nwith posthoc_ema.model(model, sigma_rel=0.15) as ema_model:\n    ema_predictions = ema_model(data)\n\n# Without model\nposthoc_ema = PostHocEMA.from_path(\"posthoc-ema\")\nwith posthoc_ema.state_dict(sigma_rel=0.15) as state_dict:\n    model.load_state_dict(state_dict, strict=False)\n```\n\n## Advanced Usage\n\n### Switch EMA During Training\n\n```python\nwith posthoc_ema.state_dict(sigma_rel=0.15) as state_dict:\n    model.load_state_dict(state_dict, strict=False)\n```\n\n### Visualize Reconstruction Quality\n\n```python\nposthoc_ema.reconstruction_error()\n```\n\n### Configuration\n\n```python\nposthoc_ema = PostHocEMA.from_model(\n    model,\n    checkpoint_dir=\"path/to/checkpoints\",\n    max_checkpoints=20,  # Keep last 20 checkpoints per EMA model\n    sigma_rels=(0.05, 0.28),  # Default relative standard deviations from paper\n    update_every=10,  # Update EMA weights every 10 steps\n    checkpoint_every=1000,  # Create checkpoints every 1000 steps\n    checkpoint_dtype=torch.float16,  # Store checkpoints in half precision\n)\n```\n\n## Citations\n\n```bibtex\n@article{Karras2023AnalyzingAI,\n    title   = {Analyzing and Improving the Training Dynamics of Diffusion Models},\n    author  = {Tero Karras and Miika Aittala and Jaakko Lehtinen and Janne Hellsten and Timo Aila and Samuli Laine},\n    journal = {ArXiv},\n    year    = {2023},\n    volume  = {abs/2312.02696}\n}\n```\n\n```bibtex\n@article{Lee2024SlowAS,\n    title   = {Slow and Steady Wins the Race: Maintaining Plasticity with Hare and Tortoise Networks},\n    author  = {Hojoon Lee and Hyeonseo Cho and Hyunseung Kim and Donghu Kim and Dugki Min and Jaegul Choo and Clare Lyle},\n    journal = {ArXiv},\n    year    = {2024},\n    volume  = {abs/2406.02596}\n}\n```\n\n```bibtex\n@article{Li2024SwitchEA,\n    title   = {Switch EMA: A Free Lunch for Better Flatness and Sharpness},\n    author  = {Siyuan Li and Zicheng Liu and Juanxi Tian and Ge Wang and Zedong Wang and Weiyang Jin and Di Wu and Cheng Tan and Tao Lin and Yang Liu and Baigui Sun and Stan Z. Li},\n    journal = {ArXiv},\n    year    = {2024},\n    volume  = {abs/2402.09240}\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Post-hoc EMA synthesis for PyTorch",
    "version": "1.0.8",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e8ff035c73d51ddb4960ec1922f2d3dde39e3fc8a95e4815435e4cc205f29f68",
                "md5": "d0d948043dc2b1316617310198ddd5ba",
                "sha256": "75d04bac0e20d2c2c0d0b54eb21bcfb2ff63c5115586a445577f3e31c799c4f6"
            },
            "downloads": -1,
            "filename": "pytorch_posthoc_ema-1.0.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d0d948043dc2b1316617310198ddd5ba",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 20305,
            "upload_time": "2025-02-16T10:51:08",
            "upload_time_iso_8601": "2025-02-16T10:51:08.570796Z",
            "url": "https://files.pythonhosted.org/packages/e8/ff/035c73d51ddb4960ec1922f2d3dde39e3fc8a95e4815435e4cc205f29f68/pytorch_posthoc_ema-1.0.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-16 10:51:08",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "pytorch-posthoc-ema"
}

Phil Wang