ZnTrack


NameZnTrack JSON
Version 0.7.3 PyPI version JSON
download
home_pageNone
SummaryCreate, Run and Benchmark DVC Pipelines in Python
upload_time2024-05-14 14:41:55
maintainerNone
docs_urlNone
authorzincwarecode
requires_python<4.0.0,>=3.9
licenseApache-2.0
keywords data-science data-version-control machine-learning reproducibility collaboration
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![coeralls](https://coveralls.io/repos/github/zincware/ZnTrack/badge.svg)](https://coveralls.io/github/zincware/ZnTrack)
[![codecov](https://codecov.io/gh/zincware/ZnTrack/branch/main/graph/badge.svg?token=ZQ67FXN1IT)](https://codecov.io/gh/zincware/ZnTrack)
[![Maintainability](https://api.codeclimate.com/v1/badges/f25e119bbd5d5ec74e2c/maintainability)](https://codeclimate.com/github/zincware/ZnTrack/maintainability)
![PyTest](https://github.com/zincware/ZnTrack/actions/workflows/test.yaml/badge.svg)
[![PyPI version](https://badge.fury.io/py/zntrack.svg)](https://badge.fury.io/py/zntrack)
[![code-style](https://img.shields.io/badge/code%20style-black-black)](https://github.com/psf/black/)
[![Documentation](https://readthedocs.org/projects/zntrack/badge/?version=latest)](https://zntrack.readthedocs.io/en/latest/?badge=latest)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/zincware/ZnTrack/HEAD)
[![DOI](https://img.shields.io/badge/arXiv-2401.10603-red)](https://arxiv.org/abs/2401.10603)
[![ZnTrack](https://img.shields.io/badge/Powered%20by-ZnTrack-%23007CB0)](https://zntrack.readthedocs.io/en/latest/)
[![zincware](https://img.shields.io/badge/Powered%20by-zincware-darkcyan)](https://github.com/zincware)

![Logo](https://raw.githubusercontent.com/zincware/ZnTrack/main/docs/source/_static/logo_ZnTrack.png)

# ZnTrack: A Parameter Tracking Package for Python

ZnTrack `zɪŋk træk` is a lightweight and easy-to-use package for tracking
parameters in your Python projects using DVC. With ZnTrack, you can define
parameters in Python classes and monitor how they change over time. This
information can then be used to compare the results of different runs, identify
computational bottlenecks, and avoid the re-running of code components where
parameters have not changed.

## Key Features

- Parameter, output and metric tracking: ZnTrack makes it easy to store and
  track the values of parameters in your Python code. It further allows you to
  store any outputs produced and gives an easy interface to define metrics.
- Lightweight and database-free: Unlike other parameter tracking solutions,
  ZnTrack is lightweight and does not require any databases.

## Getting Started

To get started with ZnTrack, you can install it via pip: `pip install zntrack`

Next, you can start using ZnTrack to track parameters, outputs and metrics in
your Python code. Here's an example of how to use ZnTrack to track the value of
a parameter in a Python class. Start in an empty directory and run `git init`
and `dvc init` for preparation.

Then put the following into a python file called `hello_world.py` and call it
with `python hello_world.py`.

```python
import zntrack
from random import randrange


class HelloWorld(zntrack.Node):
    """Define a ZnTrack Node"""
    # parameter to be tracked
    max_number: int = zntrack.params()
    # parameter to store as output
    random_number: int = zntrack.outs()

    def run(self):
        """Command to be run by DVC"""
        self.random_number = randrange(self.max_number)

if __name__ == "__main__":
    # Write the computational graph
    with zntrack.Project() as project:
        hello_world = HelloWorld(max_number=512)
    project.run()
```

This will create a [DVC](https://dvc.org) stage `HelloWorld`. The workflow is
defined in `dvc.yaml` and the parameters are stored in `params.yaml`.

This will run the workflow with `dvc repro` automatically. Once the graph is
executed, the results, i.e. the random number can be accessed directly by the
Node object.

```python
hello_world.load()
print(hello_world.random_number)
```

> ## Tip
>
> You can easily load this Node directly from a repository.
>
> ```python
> import zntrack
>
> node = zntrack.from_rev(
>     "HelloWorld",
>     remote="https://github.com/PythonFZ/ZnTrackExamples.git",
>     rev="890c714",
> )
> ```
>
> Try accessing the `max_number` parameter and `random_number` output. All Nodes
> from this and many other repositories can be loaded like this.

An overview of all the ZnTrack features as well as more detailed examples can be
found in the [ZnTrack Documentation](https://zntrack.readthedocs.io/en/latest/).

## Wrap Python Functions

ZnTrack also provides tools to convert a Python function into a DVC Node. This
approach is much more lightweight compared to the class-based approach with only
a reduced set of functionality. Therefore, it is recommended for smaller nodes
that do not need the additional toolset that the class-based approach provides.

```python
from zntrack import nodify, NodeConfig
import pathlib

@nodify(outs=pathlib.Path("text.txt"), params={"text": "Lorem Ipsum"})
def write_text(cfg: NodeConfig):
    cfg.outs.write_text(
        cfg.params.text
    )
# build the DVC graph
with zntrack.Project() as project:
    write_text()
project.run()
```

The `cfg` dataclass passed to the function provides access to all configured
files and parameters via [dot4dict](https://github.com/zincware/dot4dict). The
function body will be executed by the `dvc repro` command or if ran via
`write_text(run=True)`. All parameters are loaded from or stored in
`params.yaml`.

# Technical Details

## ZnTrack as an Object-Relational Mapping for DVC

On a fundamental level the ZnTrack package provides an easy-to-use interface for
DVC directly from Python. It handles all the computational overhead of reading
config files, defining outputs in the `dvc.yaml` as well as in the script and
much more.

For more information on DVC visit their [homepage](https://dvc.org/doc).

# References

If you use ZnTrack in your research and find it helpful please cite us.

```bibtex
@misc{zillsZnTrackDataCode2024,
  title = {{{ZnTrack}} -- {{Data}} as {{Code}}},
  author = {Zills, Fabian and Sch{\"a}fer, Moritz and Tovey, Samuel and K{\"a}stner, Johannes and Holm, Christian},
  year = {2024},
  eprint={2401.10603},
  archivePrefix={arXiv},
}
```

# Copyright

This project is distributed under the
[Apache License Version 2.0](https://github.com/zincware/ZnTrack/blob/main/LICENSE).

## Similar Tools

The following (incomplete) list of other projects that either work together with
ZnTrack or can achieve similar results with slightly different goals or
programming languages.

- [DVC](https://dvc.org/) - Main dependency of ZnTrack for Data Version Control.
- [dvthis](https://github.com/jcpsantiago/dvthis) - Introduce DVC to R.
- [DAGsHub Client](https://github.com/DAGsHub/client) - Logging parameters from
  within .Python
- [MLFlow](https://mlflow.org/) - A Machine Learning Lifecycle Platform.
- [Metaflow](https://metaflow.org/) - A framework for real-life data science.
- [Hydra](https://hydra.cc/) - A framework for elegantly configuring complex
  applications
- [Snakemake](https://snakemake.readthedocs.io/en/stable/) - Workflow management
  system to create reproducible and scalable data analyses.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "ZnTrack",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0.0,>=3.9",
    "maintainer_email": null,
    "keywords": "data-science, data-version-control, machine-learning, reproducibility, collaboration",
    "author": "zincwarecode",
    "author_email": "zincwarecode@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/1c/ca/92ed104e45e90bb103844fe7a938aa6b1af5fa9122752d8434ff46745c62/zntrack-0.7.3.tar.gz",
    "platform": null,
    "description": "[![coeralls](https://coveralls.io/repos/github/zincware/ZnTrack/badge.svg)](https://coveralls.io/github/zincware/ZnTrack)\n[![codecov](https://codecov.io/gh/zincware/ZnTrack/branch/main/graph/badge.svg?token=ZQ67FXN1IT)](https://codecov.io/gh/zincware/ZnTrack)\n[![Maintainability](https://api.codeclimate.com/v1/badges/f25e119bbd5d5ec74e2c/maintainability)](https://codeclimate.com/github/zincware/ZnTrack/maintainability)\n![PyTest](https://github.com/zincware/ZnTrack/actions/workflows/test.yaml/badge.svg)\n[![PyPI version](https://badge.fury.io/py/zntrack.svg)](https://badge.fury.io/py/zntrack)\n[![code-style](https://img.shields.io/badge/code%20style-black-black)](https://github.com/psf/black/)\n[![Documentation](https://readthedocs.org/projects/zntrack/badge/?version=latest)](https://zntrack.readthedocs.io/en/latest/?badge=latest)\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/zincware/ZnTrack/HEAD)\n[![DOI](https://img.shields.io/badge/arXiv-2401.10603-red)](https://arxiv.org/abs/2401.10603)\n[![ZnTrack](https://img.shields.io/badge/Powered%20by-ZnTrack-%23007CB0)](https://zntrack.readthedocs.io/en/latest/)\n[![zincware](https://img.shields.io/badge/Powered%20by-zincware-darkcyan)](https://github.com/zincware)\n\n![Logo](https://raw.githubusercontent.com/zincware/ZnTrack/main/docs/source/_static/logo_ZnTrack.png)\n\n# ZnTrack: A Parameter Tracking Package for Python\n\nZnTrack `z\u026a\u014bk tr\u00e6k` is a lightweight and easy-to-use package for tracking\nparameters in your Python projects using DVC. With ZnTrack, you can define\nparameters in Python classes and monitor how they change over time. This\ninformation can then be used to compare the results of different runs, identify\ncomputational bottlenecks, and avoid the re-running of code components where\nparameters have not changed.\n\n## Key Features\n\n- Parameter, output and metric tracking: ZnTrack makes it easy to store and\n  track the values of parameters in your Python code. It further allows you to\n  store any outputs produced and gives an easy interface to define metrics.\n- Lightweight and database-free: Unlike other parameter tracking solutions,\n  ZnTrack is lightweight and does not require any databases.\n\n## Getting Started\n\nTo get started with ZnTrack, you can install it via pip: `pip install zntrack`\n\nNext, you can start using ZnTrack to track parameters, outputs and metrics in\nyour Python code. Here's an example of how to use ZnTrack to track the value of\na parameter in a Python class. Start in an empty directory and run `git init`\nand `dvc init` for preparation.\n\nThen put the following into a python file called `hello_world.py` and call it\nwith `python hello_world.py`.\n\n```python\nimport zntrack\nfrom random import randrange\n\n\nclass HelloWorld(zntrack.Node):\n    \"\"\"Define a ZnTrack Node\"\"\"\n    # parameter to be tracked\n    max_number: int = zntrack.params()\n    # parameter to store as output\n    random_number: int = zntrack.outs()\n\n    def run(self):\n        \"\"\"Command to be run by DVC\"\"\"\n        self.random_number = randrange(self.max_number)\n\nif __name__ == \"__main__\":\n    # Write the computational graph\n    with zntrack.Project() as project:\n        hello_world = HelloWorld(max_number=512)\n    project.run()\n```\n\nThis will create a [DVC](https://dvc.org) stage `HelloWorld`. The workflow is\ndefined in `dvc.yaml` and the parameters are stored in `params.yaml`.\n\nThis will run the workflow with `dvc repro` automatically. Once the graph is\nexecuted, the results, i.e. the random number can be accessed directly by the\nNode object.\n\n```python\nhello_world.load()\nprint(hello_world.random_number)\n```\n\n> ## Tip\n>\n> You can easily load this Node directly from a repository.\n>\n> ```python\n> import zntrack\n>\n> node = zntrack.from_rev(\n>     \"HelloWorld\",\n>     remote=\"https://github.com/PythonFZ/ZnTrackExamples.git\",\n>     rev=\"890c714\",\n> )\n> ```\n>\n> Try accessing the `max_number` parameter and `random_number` output. All Nodes\n> from this and many other repositories can be loaded like this.\n\nAn overview of all the ZnTrack features as well as more detailed examples can be\nfound in the [ZnTrack Documentation](https://zntrack.readthedocs.io/en/latest/).\n\n## Wrap Python Functions\n\nZnTrack also provides tools to convert a Python function into a DVC Node. This\napproach is much more lightweight compared to the class-based approach with only\na reduced set of functionality. Therefore, it is recommended for smaller nodes\nthat do not need the additional toolset that the class-based approach provides.\n\n```python\nfrom zntrack import nodify, NodeConfig\nimport pathlib\n\n@nodify(outs=pathlib.Path(\"text.txt\"), params={\"text\": \"Lorem Ipsum\"})\ndef write_text(cfg: NodeConfig):\n    cfg.outs.write_text(\n        cfg.params.text\n    )\n# build the DVC graph\nwith zntrack.Project() as project:\n    write_text()\nproject.run()\n```\n\nThe `cfg` dataclass passed to the function provides access to all configured\nfiles and parameters via [dot4dict](https://github.com/zincware/dot4dict). The\nfunction body will be executed by the `dvc repro` command or if ran via\n`write_text(run=True)`. All parameters are loaded from or stored in\n`params.yaml`.\n\n# Technical Details\n\n## ZnTrack as an Object-Relational Mapping for DVC\n\nOn a fundamental level the ZnTrack package provides an easy-to-use interface for\nDVC directly from Python. It handles all the computational overhead of reading\nconfig files, defining outputs in the `dvc.yaml` as well as in the script and\nmuch more.\n\nFor more information on DVC visit their [homepage](https://dvc.org/doc).\n\n# References\n\nIf you use ZnTrack in your research and find it helpful please cite us.\n\n```bibtex\n@misc{zillsZnTrackDataCode2024,\n  title = {{{ZnTrack}} -- {{Data}} as {{Code}}},\n  author = {Zills, Fabian and Sch{\\\"a}fer, Moritz and Tovey, Samuel and K{\\\"a}stner, Johannes and Holm, Christian},\n  year = {2024},\n  eprint={2401.10603},\n  archivePrefix={arXiv},\n}\n```\n\n# Copyright\n\nThis project is distributed under the\n[Apache License Version 2.0](https://github.com/zincware/ZnTrack/blob/main/LICENSE).\n\n## Similar Tools\n\nThe following (incomplete) list of other projects that either work together with\nZnTrack or can achieve similar results with slightly different goals or\nprogramming languages.\n\n- [DVC](https://dvc.org/) - Main dependency of ZnTrack for Data Version Control.\n- [dvthis](https://github.com/jcpsantiago/dvthis) - Introduce DVC to R.\n- [DAGsHub Client](https://github.com/DAGsHub/client) - Logging parameters from\n  within .Python\n- [MLFlow](https://mlflow.org/) - A Machine Learning Lifecycle Platform.\n- [Metaflow](https://metaflow.org/) - A framework for real-life data science.\n- [Hydra](https://hydra.cc/) - A framework for elegantly configuring complex\n  applications\n- [Snakemake](https://snakemake.readthedocs.io/en/stable/) - Workflow management\n  system to create reproducible and scalable data analyses.\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Create, Run and Benchmark DVC Pipelines in Python",
    "version": "0.7.3",
    "project_urls": {
        "documentation": "https://zntrack.readthedocs.io",
        "repository": "https://github.com/zincware/ZnTrack"
    },
    "split_keywords": [
        "data-science",
        " data-version-control",
        " machine-learning",
        " reproducibility",
        " collaboration"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5ee151d49340e6bdc7608dc8bea84d9fc140ab9d6aa6f3c858b9ff93b6dd8af3",
                "md5": "7581fbfae8dc9d60778bc884619c172f",
                "sha256": "93b307ff9c94fb23184d759e5cd2fe68a7189ce443502e09f13ef937f8596f3b"
            },
            "downloads": -1,
            "filename": "zntrack-0.7.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7581fbfae8dc9d60778bc884619c172f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0.0,>=3.9",
            "size": 58358,
            "upload_time": "2024-05-14T14:41:53",
            "upload_time_iso_8601": "2024-05-14T14:41:53.687999Z",
            "url": "https://files.pythonhosted.org/packages/5e/e1/51d49340e6bdc7608dc8bea84d9fc140ab9d6aa6f3c858b9ff93b6dd8af3/zntrack-0.7.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1cca92ed104e45e90bb103844fe7a938aa6b1af5fa9122752d8434ff46745c62",
                "md5": "35464a6ac752b51b06dd3ce599d06c2f",
                "sha256": "b02952974d5cf37b4d3463e0885beb069aa9be8c4e3a84c675261706a7eadcd3"
            },
            "downloads": -1,
            "filename": "zntrack-0.7.3.tar.gz",
            "has_sig": false,
            "md5_digest": "35464a6ac752b51b06dd3ce599d06c2f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0.0,>=3.9",
            "size": 49913,
            "upload_time": "2024-05-14T14:41:55",
            "upload_time_iso_8601": "2024-05-14T14:41:55.262131Z",
            "url": "https://files.pythonhosted.org/packages/1c/ca/92ed104e45e90bb103844fe7a938aa6b1af5fa9122752d8434ff46745c62/zntrack-0.7.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-14 14:41:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "zincware",
    "github_project": "ZnTrack",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "zntrack"
}
        
Elapsed time: 0.25203s