# TorchSnapshot (Beta Release)
<p align="center">
<a href="https://github.com/pytorch/torchsnapshot/actions?query=branch%3Amain"><img src="https://img.shields.io/github/actions/workflow/status/pytorch/torchsnapshot/.github/workflows/run_tests.yaml?branch=main" alt="build status"></a>
<a href="https://pypi.org/project/torchsnapshot"><img src="https://img.shields.io/pypi/v/torchsnapshot" alt="pypi version"></a>
<a href="https://anaconda.org/conda-forge/torchsnapshot"><img src="https://img.shields.io/conda/vn/conda-forge/torchsnapshot" alt="conda version"></a>
<a href="https://pypi.org/project/torchsnapshot-nightly"><img src="https://img.shields.io/pypi/v/torchsnapshot-nightly?label=nightly" alt="pypi nightly version"></a>
<a href="https://codecov.io/gh/pytorch/torchsnapshot"><img src="https://codecov.io/gh/pytorch/torchsnapshot/branch/main/graph/badge.svg?token=DR67Q6T7YF" alt="codecov"></a>
<a href="https://github.com/pytorch/torchsnapshot/blob/main/LICENSE"><img src="https://img.shields.io/pypi/l/torchsnapshot" alt="bsd license"></a>
</div>
A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind.
## Install
Requires Python >= 3.8 and PyTorch >= 2.0.0
From pip:
```bash
# Stable
pip install torchsnapshot
# Or, using conda
conda install -c conda-forge torchsnapshot
# Nightly
pip install --pre torchsnapshot-nightly
```
From source:
```bash
git clone https://github.com/pytorch/torchsnapshot
cd torchsnapshot
pip install -r requirements.txt
python setup.py install
```
## Why TorchSnapshot
**Performance**
- TorchSnapshot provides a fast checkpointing implementation employing various optimizations, including zero-copy serialization for most tensor types, overlapped device-to-host copy and storage I/O, parallelized storage I/O.
- TorchSnapshot greatly speeds up checkpointing for DistributedDataParallel workloads by distributing the write load across all ranks ([benchmark](https://github.com/pytorch/torchsnapshot/tree/main/benchmarks/ddp)).
- When host memory is abundant, TorchSnapshot allows training to resume before all storage I/O completes, reducing the time blocked by checkpoint saving.
**Memory Usage**
- TorchSnapshot's memory usage adapts to the host's available resources, greatly reducing the chance of out-of-memory issues when saving and loading checkpoints.
- TorchSnapshot supports efficient random access to individual objects within a snapshot, even when the snapshot is stored in a cloud object storage.
**Usability**
- Simple APIs that are consistent between distributed and non-distributed workloads.
- Out of the box integration with commonly used cloud object storage systems.
- Automatic resharding (elasticity) on world size change for supported workloads ([more details](https://pytorch.org/torchsnapshot/getting_started.html#elasticity-experimental)).
**Security**
- Secure tensor serialization without pickle dependency [WIP].
## Getting Started
```python
from torchsnapshot import Snapshot
# Taking a snapshot
app_state = {"model": model, "optimizer": optimizer}
snapshot = Snapshot.take(path="/path/to/snapshot", app_state=app_state)
# Restoring from a snapshot
snapshot.restore(app_state=app_state)
```
See the [documentation](https://pytorch.org/torchsnapshot/main/getting_started.html) for more details.
## License
torchsnapshot is BSD licensed, as found in the [LICENSE](LICENSE) file.
Raw data
{
"_id": null,
"home_page": "https://github.com/pytorch/torchsnapshot",
"name": "torchsnapshot-nightly",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "pytorch, snapshot, checkpoint",
"author": "torchsnapshot team",
"author_email": "yifu@fb.com",
"download_url": "https://files.pythonhosted.org/packages/e9/b1/2d239671cd5622ab47975d48b97e290e51b34b8f8a7e588fab9bc8e22d78/torchsnapshot_nightly-2024.7.26.tar.gz",
"platform": null,
"description": "# TorchSnapshot (Beta Release)\n\n<p align=\"center\">\n<a href=\"https://github.com/pytorch/torchsnapshot/actions?query=branch%3Amain\"><img src=\"https://img.shields.io/github/actions/workflow/status/pytorch/torchsnapshot/.github/workflows/run_tests.yaml?branch=main\" alt=\"build status\"></a>\n<a href=\"https://pypi.org/project/torchsnapshot\"><img src=\"https://img.shields.io/pypi/v/torchsnapshot\" alt=\"pypi version\"></a>\n<a href=\"https://anaconda.org/conda-forge/torchsnapshot\"><img src=\"https://img.shields.io/conda/vn/conda-forge/torchsnapshot\" alt=\"conda version\"></a>\n<a href=\"https://pypi.org/project/torchsnapshot-nightly\"><img src=\"https://img.shields.io/pypi/v/torchsnapshot-nightly?label=nightly\" alt=\"pypi nightly version\"></a>\n<a href=\"https://codecov.io/gh/pytorch/torchsnapshot\"><img src=\"https://codecov.io/gh/pytorch/torchsnapshot/branch/main/graph/badge.svg?token=DR67Q6T7YF\" alt=\"codecov\"></a>\n<a href=\"https://github.com/pytorch/torchsnapshot/blob/main/LICENSE\"><img src=\"https://img.shields.io/pypi/l/torchsnapshot\" alt=\"bsd license\"></a>\n</div>\n\nA performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind.\n\n\n## Install\n\nRequires Python >= 3.8 and PyTorch >= 2.0.0\n\nFrom pip:\n\n```bash\n# Stable\npip install torchsnapshot\n# Or, using conda\nconda install -c conda-forge torchsnapshot\n\n# Nightly\npip install --pre torchsnapshot-nightly\n```\n\n\nFrom source:\n\n```bash\ngit clone https://github.com/pytorch/torchsnapshot\ncd torchsnapshot\npip install -r requirements.txt\npython setup.py install\n```\n\n## Why TorchSnapshot\n\n**Performance**\n- TorchSnapshot provides a fast checkpointing implementation employing various optimizations, including zero-copy serialization for most tensor types, overlapped device-to-host copy and storage I/O, parallelized storage I/O.\n- TorchSnapshot greatly speeds up checkpointing for DistributedDataParallel workloads by distributing the write load across all ranks ([benchmark](https://github.com/pytorch/torchsnapshot/tree/main/benchmarks/ddp)).\n- When host memory is abundant, TorchSnapshot allows training to resume before all storage I/O completes, reducing the time blocked by checkpoint saving.\n\n**Memory Usage**\n- TorchSnapshot's memory usage adapts to the host's available resources, greatly reducing the chance of out-of-memory issues when saving and loading checkpoints.\n- TorchSnapshot supports efficient random access to individual objects within a snapshot, even when the snapshot is stored in a cloud object storage.\n\n**Usability**\n- Simple APIs that are consistent between distributed and non-distributed workloads.\n- Out of the box integration with commonly used cloud object storage systems.\n- Automatic resharding (elasticity) on world size change for supported workloads ([more details](https://pytorch.org/torchsnapshot/getting_started.html#elasticity-experimental)).\n\n**Security**\n- Secure tensor serialization without pickle dependency [WIP].\n\n\n## Getting Started\n\n```python\nfrom torchsnapshot import Snapshot\n\n# Taking a snapshot\napp_state = {\"model\": model, \"optimizer\": optimizer}\nsnapshot = Snapshot.take(path=\"/path/to/snapshot\", app_state=app_state)\n\n# Restoring from a snapshot\nsnapshot.restore(app_state=app_state)\n```\n\nSee the [documentation](https://pytorch.org/torchsnapshot/main/getting_started.html) for more details.\n\n\n## License\n\ntorchsnapshot is BSD licensed, as found in the [LICENSE](LICENSE) file.\n",
"bugtrack_url": null,
"license": "BSD-3",
"summary": "A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind.",
"version": "2024.7.26",
"project_urls": {
"Homepage": "https://github.com/pytorch/torchsnapshot"
},
"split_keywords": [
"pytorch",
" snapshot",
" checkpoint"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f205f2f19c1838811757e80a5e64ad1401154d09e90bd2caebd0b30ce7f557da",
"md5": "1c505b2d00201e658000ceeadf1de946",
"sha256": "8fa53e4eda4fc334bafc523f3245ddab767f8666913164036ff960983b55051b"
},
"downloads": -1,
"filename": "torchsnapshot_nightly-2024.7.26-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1c505b2d00201e658000ceeadf1de946",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 84650,
"upload_time": "2024-07-26T11:31:40",
"upload_time_iso_8601": "2024-07-26T11:31:40.253396Z",
"url": "https://files.pythonhosted.org/packages/f2/05/f2f19c1838811757e80a5e64ad1401154d09e90bd2caebd0b30ce7f557da/torchsnapshot_nightly-2024.7.26-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e9b12d239671cd5622ab47975d48b97e290e51b34b8f8a7e588fab9bc8e22d78",
"md5": "4781de8251985c941a35f45d4f58b717",
"sha256": "48ef217adb7f2f46b54391e056276fda9f8ae9415254b313b8ccad0f00393928"
},
"downloads": -1,
"filename": "torchsnapshot_nightly-2024.7.26.tar.gz",
"has_sig": false,
"md5_digest": "4781de8251985c941a35f45d4f58b717",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 92997,
"upload_time": "2024-07-26T11:31:43",
"upload_time_iso_8601": "2024-07-26T11:31:43.224012Z",
"url": "https://files.pythonhosted.org/packages/e9/b1/2d239671cd5622ab47975d48b97e290e51b34b8f8a7e588fab9bc8e22d78/torchsnapshot_nightly-2024.7.26.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-26 11:31:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "pytorch",
"github_project": "torchsnapshot",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"requirements": [
{
"name": "PyYAML",
"specs": []
},
{
"name": "aiofiles",
"specs": []
},
{
"name": "aiohttp",
"specs": []
},
{
"name": "importlib-metadata",
"specs": []
},
{
"name": "nest_asyncio",
"specs": []
},
{
"name": "psutil",
"specs": []
},
{
"name": "pyre_extensions",
"specs": []
},
{
"name": "torch",
"specs": []
},
{
"name": "typing-extensions",
"specs": []
}
],
"lcname": "torchsnapshot-nightly"
}