Name | zeus-ml JSON |
Version |
0.8.1
JSON |
| download |
home_page | |
Summary | A framework for deep learning energy measurement and optimization. |
upload_time | 2024-02-25 22:03:34 |
maintainer | |
docs_url | None |
author | |
requires_python | >=3.8 |
license | Apache 2.0 |
keywords |
deep-learning
power
energy
sustainability
mlsys
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
<div align="center">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="docs/assets/img/logo_dark.svg">
<source media="(prefers-color-scheme: light)" srcset="docs/assets/img/logo_light.svg">
<img alt="Zeus logo" width="55%" src="docs/assets/img/logo_dark.svg">
</picture>
<h1>Deep Learning Energy Measurement and Optimization</h1>
</div>
[![NSDI23 paper](https://custom-icon-badges.herokuapp.com/badge/NSDI'23-paper-b31b1b.svg)](https://www.usenix.org/conference/nsdi23/presentation/you)
[![Docker Hub](https://badgen.net/docker/pulls/symbioticlab/zeus?icon=docker&label=Docker%20pulls)](https://hub.docker.com/r/mlenergy/zeus)
[![Slack workspace](https://badgen.net/badge/icon/Join%20workspace/611f69?icon=slack&label=Slack)](https://join.slack.com/t/zeus-ml/shared_invite/zt-1najba5mb-WExy7zoNTyaZZfTlUWoLLg)
[![Homepage build](https://github.com/ml-energy/zeus/actions/workflows/deploy_homepage.yaml/badge.svg)](https://github.com/ml-energy/zeus/actions/workflows/deploy_homepage.yaml)
[![Apache-2.0 License](https://custom-icon-badges.herokuapp.com/github/license/ml-energy/zeus?logo=law)](/LICENSE)
---
**Project News** ⚡
- \[2023/12\] The preprint of the Perseus paper is out [here](https://arxiv.org/abs/2312.06902)!
- \[2023/10\] We released Perseus, an energy optimizer for large model training. Get started [here](https://ml.energy/zeus/perseus/)!
- \[2023/09\] We moved to under [`ml-energy`](https://github.com/ml-energy)! Please stay tuned for new exciting projects!
- \[2023/07\] [`ZeusMonitor`](https://ml.energy/zeus/reference/monitor/#zeus.monitor.ZeusMonitor) was used to profile GPU time and energy consumption for the [ML.ENERGY leaderboard & Colosseum](https://ml.energy/leaderboard).
- \[2023/03\] [Chase](https://symbioticlab.org/publications/files/chase:ccai23/chase-ccai23.pdf), an automatic carbon optimization framework for DNN training, will appear at ICLR'23 workshop.
- \[2022/11\] [Carbon-Aware Zeus](https://taikai.network/gsf/hackathons/carbonhack22/projects/cl95qxjpa70555701uhg96r0ek6/idea) won the **second overall best solution award** at Carbon Hack 22.
---
Zeus is a framework for (1) measuring GPU energy consumption and (2) optimizing energy and time for DNN training.
### Measuring GPU energy
```python
from zeus.monitor import ZeusMonitor
monitor = ZeusMonitor(gpu_indices=[0,1,2,3])
monitor.begin_window("heavy computation")
# Four GPUs consuming energy like crazy!
measurement = monitor.end_window("heavy computation")
print(f"Energy: {measurement.total_energy} J")
print(f"Time : {measurement.time} s")
```
### Finding the optimal GPU power limit
Zeus silently profiles different power limits during training and converges to the optimal one.
```python
from zeus.monitor import ZeusMonitor
from zeus.optimizer import GlobalPowerLimitOptimizer
monitor = ZeusMonitor(gpu_indices=[0,1,2,3])
plo = GlobalPowerLimitOptimizer(monitor)
plo.on_epoch_begin()
for x, y in train_dataloader:
plo.on_step_begin()
# Learn from x and y!
plo.on_step_end()
plo.on_epoch_end()
```
### CLI power and energy monitor
```console
$ python -m zeus.monitor power
[2023-08-22 22:39:59,787] [PowerMonitor](power.py:134) Monitoring power usage of GPUs [0, 1, 2, 3]
2023-08-22 22:40:00.800576
{'GPU0': 66.176, 'GPU1': 68.792, 'GPU2': 66.898, 'GPU3': 67.53}
2023-08-22 22:40:01.842590
{'GPU0': 66.078, 'GPU1': 68.595, 'GPU2': 66.996, 'GPU3': 67.138}
2023-08-22 22:40:02.845734
{'GPU0': 66.078, 'GPU1': 68.693, 'GPU2': 66.898, 'GPU3': 67.236}
2023-08-22 22:40:03.848818
{'GPU0': 66.177, 'GPU1': 68.675, 'GPU2': 67.094, 'GPU3': 66.926}
^C
Total time (s): 4.421529293060303
Total energy (J):
{'GPU0': 198.52566362297537, 'GPU1': 206.22215216255188, 'GPU2': 201.08565518283845, 'GPU3': 201.79834523367884}
```
```console
$ python -m zeus.monitor energy
[2023-08-22 22:44:45,106] [ZeusMonitor](energy.py:157) Monitoring GPU [0, 1, 2, 3].
[2023-08-22 22:44:46,210] [zeus.util.framework](framework.py:38) PyTorch with CUDA support is available.
[2023-08-22 22:44:46,760] [ZeusMonitor](energy.py:329) Measurement window 'zeus.monitor.energy' started.
^C[2023-08-22 22:44:50,205] [ZeusMonitor](energy.py:329) Measurement window 'zeus.monitor.energy' ended.
Total energy (J):
Measurement(time=3.4480526447296143, energy={0: 224.2969999909401, 1: 232.83799999952316, 2: 233.3100000023842, 3: 234.53700000047684})
```
Please refer to our NSDI’23 [paper](https://www.usenix.org/conference/nsdi23/presentation/you) and [slides](https://www.usenix.org/system/files/nsdi23_slides_chung.pdf) for details.
Checkout [Overview](https://ml.energy/zeus/overview/) for a summary.
Zeus is part of [The ML.ENERGY Initiative](https://ml.energy).
## Repository Organization
```
.
├── zeus/ # ⚡ Zeus Python package
│ ├── optimizer/ # - GPU energy and time optimizers
│ ├── run/ # - Tools for running Zeus on real training jobs
│ ├── policy/ # - Optimization policies and extension interfaces
│ ├── util/ # - Utility functions and classes
│ ├── monitor.py # - `ZeusMonitor`: Measure GPU time and energy of any code block
│ ├── controller.py # - Tools for controlling the flow of training
│ ├── callback.py # - Base class for Hugging Face-like training callbacks.
│ ├── simulate.py # - Tools for trace-driven simulation
│ ├── analyze.py # - Analysis functions for power logs
│ └── job.py # - Class for job specification
│
├── zeus_monitor/ # 🔌 GPU power monitor
│ ├── zemo/ # - A header-only library for querying NVML
│ └── main.cpp # - Source code of the power monitor
│
├── examples/ # 🛠️ Examples of integrating Zeus
│
├── capriccio/ # 🌊 A drifting sentiment analysis dataset
│
└── trace/ # 🗃️ Train and power traces for various GPUs and DNNs
```
## Getting Started
Refer to [Getting started](https://ml.energy/zeus/getting_started) for complete instructions on environment setup, installation, and integration.
### Docker image
We provide a Docker image fully equipped with all dependencies and environments.
The only command you need is:
```sh
docker run -it \
--gpus all `# Mount all GPUs` \
--cap-add SYS_ADMIN `# Needed to change the power limit of the GPU` \
--ipc host `# PyTorch DataLoader workers need enough shm` \
mlenergy/zeus:latest \
bash
```
Refer to [Environment setup](https://ml.energy/zeus/getting_started/environment/) for details.
### Examples
We provide working examples for integrating and running Zeus in the `examples/` directory.
## Extending Zeus
You can easily implement custom policies for batch size and power limit optimization and plug it into Zeus.
Refer to [Extending Zeus](https://ml.energy/zeus/extend/) for details.
## Carbon-Aware Zeus
The use of GPUs for training DNNs results in high carbon emissions and energy consumption. Building on top of Zeus, we introduce *Chase* -- a carbon-aware solution. *Chase* dynamically controls the energy consumption of GPUs; adapts to shifts in carbon intensity during DNN training, reducing carbon footprint with minimal compromises on training performance. To proactively adapt to shifting carbon intensity, a lightweight machine learning algorithm is used to forecast the carbon intensity of the upcoming time frame. For more details on Chase, please refer to our [paper](https://symbioticlab.org/publications/files/chase:ccai23/chase-ccai23.pdf) and the [chase branch](https://github.com/ml-energy/zeus/tree/chase).
## Citation
```bibtex
@inproceedings{zeus-nsdi23,
title = {Zeus: Understanding and Optimizing {GPU} Energy Consumption of {DNN} Training},
author = {Jie You and Jae-Won Chung and Mosharaf Chowdhury},
booktitle = {USENIX NSDI},
year = {2023}
}
```
## Contact
Jae-Won Chung (jwnchung@umich.edu)
Raw data
{
"_id": null,
"home_page": "",
"name": "zeus-ml",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "deep-learning,power,energy,sustainability,mlsys",
"author": "",
"author_email": "Jae-Won Chung <jwnchung@umich.edu>",
"download_url": "https://files.pythonhosted.org/packages/09/b6/7426001edca0d7992e00ebb24fe708ef60241a51b1ff455f2b04f57a1d0c/zeus-ml-0.8.1.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n<picture>\n <source media=\"(prefers-color-scheme: dark)\" srcset=\"docs/assets/img/logo_dark.svg\">\n <source media=\"(prefers-color-scheme: light)\" srcset=\"docs/assets/img/logo_light.svg\">\n <img alt=\"Zeus logo\" width=\"55%\" src=\"docs/assets/img/logo_dark.svg\">\n</picture>\n<h1>Deep Learning Energy Measurement and Optimization</h1>\n</div>\n\n[![NSDI23 paper](https://custom-icon-badges.herokuapp.com/badge/NSDI'23-paper-b31b1b.svg)](https://www.usenix.org/conference/nsdi23/presentation/you)\n[![Docker Hub](https://badgen.net/docker/pulls/symbioticlab/zeus?icon=docker&label=Docker%20pulls)](https://hub.docker.com/r/mlenergy/zeus)\n[![Slack workspace](https://badgen.net/badge/icon/Join%20workspace/611f69?icon=slack&label=Slack)](https://join.slack.com/t/zeus-ml/shared_invite/zt-1najba5mb-WExy7zoNTyaZZfTlUWoLLg)\n[![Homepage build](https://github.com/ml-energy/zeus/actions/workflows/deploy_homepage.yaml/badge.svg)](https://github.com/ml-energy/zeus/actions/workflows/deploy_homepage.yaml)\n[![Apache-2.0 License](https://custom-icon-badges.herokuapp.com/github/license/ml-energy/zeus?logo=law)](/LICENSE)\n\n---\n**Project News** \u26a1 \n\n- \\[2023/12\\] The preprint of the Perseus paper is out [here](https://arxiv.org/abs/2312.06902)!\n- \\[2023/10\\] We released Perseus, an energy optimizer for large model training. Get started [here](https://ml.energy/zeus/perseus/)!\n- \\[2023/09\\] We moved to under [`ml-energy`](https://github.com/ml-energy)! Please stay tuned for new exciting projects!\n- \\[2023/07\\] [`ZeusMonitor`](https://ml.energy/zeus/reference/monitor/#zeus.monitor.ZeusMonitor) was used to profile GPU time and energy consumption for the [ML.ENERGY leaderboard & Colosseum](https://ml.energy/leaderboard).\n- \\[2023/03\\] [Chase](https://symbioticlab.org/publications/files/chase:ccai23/chase-ccai23.pdf), an automatic carbon optimization framework for DNN training, will appear at ICLR'23 workshop.\n- \\[2022/11\\] [Carbon-Aware Zeus](https://taikai.network/gsf/hackathons/carbonhack22/projects/cl95qxjpa70555701uhg96r0ek6/idea) won the **second overall best solution award** at Carbon Hack 22.\n---\n\nZeus is a framework for (1) measuring GPU energy consumption and (2) optimizing energy and time for DNN training.\n\n### Measuring GPU energy\n\n```python\nfrom zeus.monitor import ZeusMonitor\n\nmonitor = ZeusMonitor(gpu_indices=[0,1,2,3])\n\nmonitor.begin_window(\"heavy computation\")\n# Four GPUs consuming energy like crazy!\nmeasurement = monitor.end_window(\"heavy computation\")\n\nprint(f\"Energy: {measurement.total_energy} J\")\nprint(f\"Time : {measurement.time} s\")\n```\n\n### Finding the optimal GPU power limit\n\nZeus silently profiles different power limits during training and converges to the optimal one.\n\n```python\nfrom zeus.monitor import ZeusMonitor\nfrom zeus.optimizer import GlobalPowerLimitOptimizer\n\nmonitor = ZeusMonitor(gpu_indices=[0,1,2,3])\nplo = GlobalPowerLimitOptimizer(monitor)\n\nplo.on_epoch_begin()\n\nfor x, y in train_dataloader:\n plo.on_step_begin()\n # Learn from x and y!\n plo.on_step_end()\n\nplo.on_epoch_end()\n```\n\n### CLI power and energy monitor\n\n```console\n$ python -m zeus.monitor power\n[2023-08-22 22:39:59,787] [PowerMonitor](power.py:134) Monitoring power usage of GPUs [0, 1, 2, 3]\n2023-08-22 22:40:00.800576\n{'GPU0': 66.176, 'GPU1': 68.792, 'GPU2': 66.898, 'GPU3': 67.53}\n2023-08-22 22:40:01.842590\n{'GPU0': 66.078, 'GPU1': 68.595, 'GPU2': 66.996, 'GPU3': 67.138}\n2023-08-22 22:40:02.845734\n{'GPU0': 66.078, 'GPU1': 68.693, 'GPU2': 66.898, 'GPU3': 67.236}\n2023-08-22 22:40:03.848818\n{'GPU0': 66.177, 'GPU1': 68.675, 'GPU2': 67.094, 'GPU3': 66.926}\n^C\nTotal time (s): 4.421529293060303\nTotal energy (J):\n{'GPU0': 198.52566362297537, 'GPU1': 206.22215216255188, 'GPU2': 201.08565518283845, 'GPU3': 201.79834523367884}\n```\n\n```console\n$ python -m zeus.monitor energy\n[2023-08-22 22:44:45,106] [ZeusMonitor](energy.py:157) Monitoring GPU [0, 1, 2, 3].\n[2023-08-22 22:44:46,210] [zeus.util.framework](framework.py:38) PyTorch with CUDA support is available.\n[2023-08-22 22:44:46,760] [ZeusMonitor](energy.py:329) Measurement window 'zeus.monitor.energy' started.\n^C[2023-08-22 22:44:50,205] [ZeusMonitor](energy.py:329) Measurement window 'zeus.monitor.energy' ended.\nTotal energy (J):\nMeasurement(time=3.4480526447296143, energy={0: 224.2969999909401, 1: 232.83799999952316, 2: 233.3100000023842, 3: 234.53700000047684})\n```\n\nPlease refer to our NSDI\u201923 [paper](https://www.usenix.org/conference/nsdi23/presentation/you) and [slides](https://www.usenix.org/system/files/nsdi23_slides_chung.pdf) for details.\nCheckout [Overview](https://ml.energy/zeus/overview/) for a summary.\n\nZeus is part of [The ML.ENERGY Initiative](https://ml.energy).\n\n## Repository Organization\n\n```\n.\n\u251c\u2500\u2500 zeus/ # \u26a1 Zeus Python package\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 optimizer/ # - GPU energy and time optimizers\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 run/ # - Tools for running Zeus on real training jobs\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 policy/ # - Optimization policies and extension interfaces\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 util/ # - Utility functions and classes\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 monitor.py # - `ZeusMonitor`: Measure GPU time and energy of any code block\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 controller.py # - Tools for controlling the flow of training\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 callback.py # - Base class for Hugging Face-like training callbacks.\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 simulate.py # - Tools for trace-driven simulation\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 analyze.py # - Analysis functions for power logs\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 job.py # - Class for job specification\n\u2502\n\u251c\u2500\u2500 zeus_monitor/ # \ud83d\udd0c GPU power monitor\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 zemo/ # - A header-only library for querying NVML\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 main.cpp # - Source code of the power monitor\n\u2502\n\u251c\u2500\u2500 examples/ # \ud83d\udee0\ufe0f Examples of integrating Zeus\n\u2502\n\u251c\u2500\u2500 capriccio/ # \ud83c\udf0a A drifting sentiment analysis dataset\n\u2502\n\u2514\u2500\u2500 trace/ # \ud83d\uddc3\ufe0f Train and power traces for various GPUs and DNNs\n```\n\n## Getting Started\n\nRefer to [Getting started](https://ml.energy/zeus/getting_started) for complete instructions on environment setup, installation, and integration.\n\n### Docker image\n\nWe provide a Docker image fully equipped with all dependencies and environments.\nThe only command you need is:\n\n```sh\ndocker run -it \\\n --gpus all `# Mount all GPUs` \\\n --cap-add SYS_ADMIN `# Needed to change the power limit of the GPU` \\\n --ipc host `# PyTorch DataLoader workers need enough shm` \\\n mlenergy/zeus:latest \\\n bash\n```\n\nRefer to [Environment setup](https://ml.energy/zeus/getting_started/environment/) for details.\n\n### Examples\n\nWe provide working examples for integrating and running Zeus in the `examples/` directory.\n\n\n## Extending Zeus\n\nYou can easily implement custom policies for batch size and power limit optimization and plug it into Zeus.\n\nRefer to [Extending Zeus](https://ml.energy/zeus/extend/) for details.\n\n\n## Carbon-Aware Zeus\n\nThe use of GPUs for training DNNs results in high carbon emissions and energy consumption. Building on top of Zeus, we introduce *Chase* -- a carbon-aware solution. *Chase* dynamically controls the energy consumption of GPUs; adapts to shifts in carbon intensity during DNN training, reducing carbon footprint with minimal compromises on training performance. To proactively adapt to shifting carbon intensity, a lightweight machine learning algorithm is used to forecast the carbon intensity of the upcoming time frame. For more details on Chase, please refer to our [paper](https://symbioticlab.org/publications/files/chase:ccai23/chase-ccai23.pdf) and the [chase branch](https://github.com/ml-energy/zeus/tree/chase). \n\n\n## Citation\n\n```bibtex\n@inproceedings{zeus-nsdi23,\n title = {Zeus: Understanding and Optimizing {GPU} Energy Consumption of {DNN} Training},\n author = {Jie You and Jae-Won Chung and Mosharaf Chowdhury},\n booktitle = {USENIX NSDI},\n year = {2023}\n}\n```\n\n## Contact\nJae-Won Chung (jwnchung@umich.edu)\n",
"bugtrack_url": null,
"license": "Apache 2.0",
"summary": "A framework for deep learning energy measurement and optimization.",
"version": "0.8.1",
"project_urls": {
"Documentation": "https://ml.energy/zeus",
"Homepage": "https://ml.energy/zeus",
"Repository": "https://github.com/ml-energy/zeus"
},
"split_keywords": [
"deep-learning",
"power",
"energy",
"sustainability",
"mlsys"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "1ccc85592ea989ae26eb32d176a884890f663e3c60dd17f6f4dfdce37a341754",
"md5": "17eae7403b999ffbe6555f7c331c84a9",
"sha256": "67b618bd4b5973826c7c012e065263021712c8be4178d2f0bf8ef76ac895fee5"
},
"downloads": -1,
"filename": "zeus_ml-0.8.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "17eae7403b999ffbe6555f7c331c84a9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 192629,
"upload_time": "2024-02-25T22:03:32",
"upload_time_iso_8601": "2024-02-25T22:03:32.120166Z",
"url": "https://files.pythonhosted.org/packages/1c/cc/85592ea989ae26eb32d176a884890f663e3c60dd17f6f4dfdce37a341754/zeus_ml-0.8.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "09b67426001edca0d7992e00ebb24fe708ef60241a51b1ff455f2b04f57a1d0c",
"md5": "a40d5133c2d472d63e230cee2cbeb692",
"sha256": "df008efa8eafade527fd8a6c2c505450f11fdbdccf9458dc0d491ab48acbb97b"
},
"downloads": -1,
"filename": "zeus-ml-0.8.1.tar.gz",
"has_sig": false,
"md5_digest": "a40d5133c2d472d63e230cee2cbeb692",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 142676,
"upload_time": "2024-02-25T22:03:34",
"upload_time_iso_8601": "2024-02-25T22:03:34.582006Z",
"url": "https://files.pythonhosted.org/packages/09/b6/7426001edca0d7992e00ebb24fe708ef60241a51b1ff455f2b04f57a1d0c/zeus-ml-0.8.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-25 22:03:34",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ml-energy",
"github_project": "zeus",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "zeus-ml"
}