dask4dvc


Namedask4dvc JSON
Version 0.2.3 PyPI version JSON
download
home_page
SummaryUse dask to run the DVC graph
upload_time2023-04-28 12:30:16
maintainer
docs_urlNone
authorzincwarecode
requires_python>=3.8,<4.0
licenseApache-2.0
keywords data-science hpc dask dvc
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Coverage Status](https://coveralls.io/repos/github/zincware/dask4dvc/badge.svg?branch=main)](https://coveralls.io/github/zincware/dask4dvc?branch=main)
![PyTest](https://github.com/zincware/dask4dvc/actions/workflows/pytest.yaml/badge.svg)
[![PyPI version](https://badge.fury.io/py/dask4dvc.svg)](https://badge.fury.io/py/dask4dvc)
[![zincware](https://img.shields.io/badge/Powered%20by-zincware-darkcyan)](https://github.com/zincware)

# Dask4DVC - Distributed Node Exectuion
[DVC](dvc.org) provides tools for building and executing the computational graph locally through various methods. 
The `dask4dvc` package combines [Dask Distributed](https://distributed.dask.org/) with DVC to make it easier to use with HPC managers like [Slurm](https://github.com/SchedMD/slurm).

The `dask4dvc repro` package will run the DVC graph in parallel where possible.
Currently, `dask4dvc run` will not run stages per experiment sequentially.

> :warning: This is an experimental package **not** affiliated in any way with iterative or DVC.

## Usage
Dask4DVC provides a CLI similar to DVC.

- `dvc repro` becomes `dask4dvc repro`.
- `dvc queue start` becomes `dask4dvc run`

You can follow the progress using `dask4dvc <cmd> --dashboard`.


### SLURM Cluster

You can use `dask4dvc` easily with a slurm cluster.
This requires a running dask scheduler:
```python
from dask_jobqueue import SLURMCluster

cluster = SLURMCluster(
    cores=1, memory='128GB',
    queue="gpu",
    processes=1,
    walltime='8:00:00',
    job_cpu=1,
    job_extra=['-N 1', '--cpus-per-task=1', '--tasks-per-node=64', "--gres=gpu:1"],
    scheduler_options={"port": 31415}
)
cluster.adapt()
```

with this setup you can then run `dask4dvc repro --address 127.0.0.1:31415` on the example port `31415`.

You can also use config files with `dask4dvc repro --config myconfig.yaml`.
All `dask.distributed` Clusters should be supported.

```yaml
default:
  SGECluster:
    queue: regular
    cores: 10
    memory: 16 GB
```

![dask4dvc repro](https://raw.githubusercontent.com/zincware/dask4dvc/main/misc/dask4dvc_1.gif "dask4dvc repro")
            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "dask4dvc",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "",
    "keywords": "data-science,HPC,dask,DVC",
    "author": "zincwarecode",
    "author_email": "zincwarecode@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/f7/b6/18163d26a00668f314f1d3c3146ec93fa9f1fe78e0253ae0cad0216c4a3a/dask4dvc-0.2.3.tar.gz",
    "platform": null,
    "description": "[![Coverage Status](https://coveralls.io/repos/github/zincware/dask4dvc/badge.svg?branch=main)](https://coveralls.io/github/zincware/dask4dvc?branch=main)\n![PyTest](https://github.com/zincware/dask4dvc/actions/workflows/pytest.yaml/badge.svg)\n[![PyPI version](https://badge.fury.io/py/dask4dvc.svg)](https://badge.fury.io/py/dask4dvc)\n[![zincware](https://img.shields.io/badge/Powered%20by-zincware-darkcyan)](https://github.com/zincware)\n\n# Dask4DVC - Distributed Node Exectuion\n[DVC](dvc.org) provides tools for building and executing the computational graph locally through various methods. \nThe `dask4dvc` package combines [Dask Distributed](https://distributed.dask.org/) with DVC to make it easier to use with HPC managers like [Slurm](https://github.com/SchedMD/slurm).\n\nThe `dask4dvc repro` package will run the DVC graph in parallel where possible.\nCurrently, `dask4dvc run` will not run stages per experiment sequentially.\n\n> :warning: This is an experimental package **not** affiliated in any way with iterative or DVC.\n\n## Usage\nDask4DVC provides a CLI similar to DVC.\n\n- `dvc repro` becomes `dask4dvc repro`.\n- `dvc queue start` becomes `dask4dvc run`\n\nYou can follow the progress using `dask4dvc <cmd> --dashboard`.\n\n\n### SLURM Cluster\n\nYou can use `dask4dvc` easily with a slurm cluster.\nThis requires a running dask scheduler:\n```python\nfrom dask_jobqueue import SLURMCluster\n\ncluster = SLURMCluster(\n    cores=1, memory='128GB',\n    queue=\"gpu\",\n    processes=1,\n    walltime='8:00:00',\n    job_cpu=1,\n    job_extra=['-N 1', '--cpus-per-task=1', '--tasks-per-node=64', \"--gres=gpu:1\"],\n    scheduler_options={\"port\": 31415}\n)\ncluster.adapt()\n```\n\nwith this setup you can then run `dask4dvc repro --address 127.0.0.1:31415` on the example port `31415`.\n\nYou can also use config files with `dask4dvc repro --config myconfig.yaml`.\nAll `dask.distributed` Clusters should be supported.\n\n```yaml\ndefault:\n  SGECluster:\n    queue: regular\n    cores: 10\n    memory: 16 GB\n```\n\n![dask4dvc repro](https://raw.githubusercontent.com/zincware/dask4dvc/main/misc/dask4dvc_1.gif \"dask4dvc repro\")",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Use dask to run the DVC graph",
    "version": "0.2.3",
    "split_keywords": [
        "data-science",
        "hpc",
        "dask",
        "dvc"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "17860e1d09f8e95893fe21b2715eaccbbc287baa7650dc7fe079172827d136ae",
                "md5": "cb7a3a80f311e9537bd9ba651debf1b7",
                "sha256": "e5adf2f493794d8f5750d32ce3ed834859d2826491d3c18c987b267b553ddc82"
            },
            "downloads": -1,
            "filename": "dask4dvc-0.2.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cb7a3a80f311e9537bd9ba651debf1b7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 12831,
            "upload_time": "2023-04-28T12:30:14",
            "upload_time_iso_8601": "2023-04-28T12:30:14.682083Z",
            "url": "https://files.pythonhosted.org/packages/17/86/0e1d09f8e95893fe21b2715eaccbbc287baa7650dc7fe079172827d136ae/dask4dvc-0.2.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f7b618163d26a00668f314f1d3c3146ec93fa9f1fe78e0253ae0cad0216c4a3a",
                "md5": "213e2c010bfddc9491583f0512e13472",
                "sha256": "de696c0c9e79f5583a4352434bee41f321113bb19f8a7303fa3627a82bc3accb"
            },
            "downloads": -1,
            "filename": "dask4dvc-0.2.3.tar.gz",
            "has_sig": false,
            "md5_digest": "213e2c010bfddc9491583f0512e13472",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 11232,
            "upload_time": "2023-04-28T12:30:16",
            "upload_time_iso_8601": "2023-04-28T12:30:16.370058Z",
            "url": "https://files.pythonhosted.org/packages/f7/b6/18163d26a00668f314f1d3c3146ec93fa9f1fe78e0253ae0cad0216c4a3a/dask4dvc-0.2.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-28 12:30:16",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "dask4dvc"
}
        
Elapsed time: 0.10032s