Name | dask4dvc JSON |
Version |
0.2.3
JSON |
| download |
home_page | |
Summary | Use dask to run the DVC graph |
upload_time | 2023-04-28 12:30:16 |
maintainer | |
docs_url | None |
author | zincwarecode |
requires_python | >=3.8,<4.0 |
license | Apache-2.0 |
keywords |
data-science
hpc
dask
dvc
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
[![Coverage Status](https://coveralls.io/repos/github/zincware/dask4dvc/badge.svg?branch=main)](https://coveralls.io/github/zincware/dask4dvc?branch=main)
![PyTest](https://github.com/zincware/dask4dvc/actions/workflows/pytest.yaml/badge.svg)
[![PyPI version](https://badge.fury.io/py/dask4dvc.svg)](https://badge.fury.io/py/dask4dvc)
[![zincware](https://img.shields.io/badge/Powered%20by-zincware-darkcyan)](https://github.com/zincware)
# Dask4DVC - Distributed Node Exectuion
[DVC](dvc.org) provides tools for building and executing the computational graph locally through various methods.
The `dask4dvc` package combines [Dask Distributed](https://distributed.dask.org/) with DVC to make it easier to use with HPC managers like [Slurm](https://github.com/SchedMD/slurm).
The `dask4dvc repro` package will run the DVC graph in parallel where possible.
Currently, `dask4dvc run` will not run stages per experiment sequentially.
> :warning: This is an experimental package **not** affiliated in any way with iterative or DVC.
## Usage
Dask4DVC provides a CLI similar to DVC.
- `dvc repro` becomes `dask4dvc repro`.
- `dvc queue start` becomes `dask4dvc run`
You can follow the progress using `dask4dvc <cmd> --dashboard`.
### SLURM Cluster
You can use `dask4dvc` easily with a slurm cluster.
This requires a running dask scheduler:
```python
from dask_jobqueue import SLURMCluster
cluster = SLURMCluster(
cores=1, memory='128GB',
queue="gpu",
processes=1,
walltime='8:00:00',
job_cpu=1,
job_extra=['-N 1', '--cpus-per-task=1', '--tasks-per-node=64', "--gres=gpu:1"],
scheduler_options={"port": 31415}
)
cluster.adapt()
```
with this setup you can then run `dask4dvc repro --address 127.0.0.1:31415` on the example port `31415`.
You can also use config files with `dask4dvc repro --config myconfig.yaml`.
All `dask.distributed` Clusters should be supported.
```yaml
default:
SGECluster:
queue: regular
cores: 10
memory: 16 GB
```
![dask4dvc repro](https://raw.githubusercontent.com/zincware/dask4dvc/main/misc/dask4dvc_1.gif "dask4dvc repro")
Raw data
{
"_id": null,
"home_page": "",
"name": "dask4dvc",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8,<4.0",
"maintainer_email": "",
"keywords": "data-science,HPC,dask,DVC",
"author": "zincwarecode",
"author_email": "zincwarecode@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/f7/b6/18163d26a00668f314f1d3c3146ec93fa9f1fe78e0253ae0cad0216c4a3a/dask4dvc-0.2.3.tar.gz",
"platform": null,
"description": "[![Coverage Status](https://coveralls.io/repos/github/zincware/dask4dvc/badge.svg?branch=main)](https://coveralls.io/github/zincware/dask4dvc?branch=main)\n![PyTest](https://github.com/zincware/dask4dvc/actions/workflows/pytest.yaml/badge.svg)\n[![PyPI version](https://badge.fury.io/py/dask4dvc.svg)](https://badge.fury.io/py/dask4dvc)\n[![zincware](https://img.shields.io/badge/Powered%20by-zincware-darkcyan)](https://github.com/zincware)\n\n# Dask4DVC - Distributed Node Exectuion\n[DVC](dvc.org) provides tools for building and executing the computational graph locally through various methods. \nThe `dask4dvc` package combines [Dask Distributed](https://distributed.dask.org/) with DVC to make it easier to use with HPC managers like [Slurm](https://github.com/SchedMD/slurm).\n\nThe `dask4dvc repro` package will run the DVC graph in parallel where possible.\nCurrently, `dask4dvc run` will not run stages per experiment sequentially.\n\n> :warning: This is an experimental package **not** affiliated in any way with iterative or DVC.\n\n## Usage\nDask4DVC provides a CLI similar to DVC.\n\n- `dvc repro` becomes `dask4dvc repro`.\n- `dvc queue start` becomes `dask4dvc run`\n\nYou can follow the progress using `dask4dvc <cmd> --dashboard`.\n\n\n### SLURM Cluster\n\nYou can use `dask4dvc` easily with a slurm cluster.\nThis requires a running dask scheduler:\n```python\nfrom dask_jobqueue import SLURMCluster\n\ncluster = SLURMCluster(\n cores=1, memory='128GB',\n queue=\"gpu\",\n processes=1,\n walltime='8:00:00',\n job_cpu=1,\n job_extra=['-N 1', '--cpus-per-task=1', '--tasks-per-node=64', \"--gres=gpu:1\"],\n scheduler_options={\"port\": 31415}\n)\ncluster.adapt()\n```\n\nwith this setup you can then run `dask4dvc repro --address 127.0.0.1:31415` on the example port `31415`.\n\nYou can also use config files with `dask4dvc repro --config myconfig.yaml`.\nAll `dask.distributed` Clusters should be supported.\n\n```yaml\ndefault:\n SGECluster:\n queue: regular\n cores: 10\n memory: 16 GB\n```\n\n![dask4dvc repro](https://raw.githubusercontent.com/zincware/dask4dvc/main/misc/dask4dvc_1.gif \"dask4dvc repro\")",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Use dask to run the DVC graph",
"version": "0.2.3",
"split_keywords": [
"data-science",
"hpc",
"dask",
"dvc"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "17860e1d09f8e95893fe21b2715eaccbbc287baa7650dc7fe079172827d136ae",
"md5": "cb7a3a80f311e9537bd9ba651debf1b7",
"sha256": "e5adf2f493794d8f5750d32ce3ed834859d2826491d3c18c987b267b553ddc82"
},
"downloads": -1,
"filename": "dask4dvc-0.2.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "cb7a3a80f311e9537bd9ba651debf1b7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8,<4.0",
"size": 12831,
"upload_time": "2023-04-28T12:30:14",
"upload_time_iso_8601": "2023-04-28T12:30:14.682083Z",
"url": "https://files.pythonhosted.org/packages/17/86/0e1d09f8e95893fe21b2715eaccbbc287baa7650dc7fe079172827d136ae/dask4dvc-0.2.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f7b618163d26a00668f314f1d3c3146ec93fa9f1fe78e0253ae0cad0216c4a3a",
"md5": "213e2c010bfddc9491583f0512e13472",
"sha256": "de696c0c9e79f5583a4352434bee41f321113bb19f8a7303fa3627a82bc3accb"
},
"downloads": -1,
"filename": "dask4dvc-0.2.3.tar.gz",
"has_sig": false,
"md5_digest": "213e2c010bfddc9491583f0512e13472",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8,<4.0",
"size": 11232,
"upload_time": "2023-04-28T12:30:16",
"upload_time_iso_8601": "2023-04-28T12:30:16.370058Z",
"url": "https://files.pythonhosted.org/packages/f7/b6/18163d26a00668f314f1d3c3146ec93fa9f1fe78e0253ae0cad0216c4a3a/dask4dvc-0.2.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-04-28 12:30:16",
"github": false,
"gitlab": false,
"bitbucket": false,
"lcname": "dask4dvc"
}