taskgraph


Nametaskgraph JSON
Version 0.11.1 PyPI version JSON
download
home_pagehttps://github.com/natcap/taskgraph
SummaryParallel task graph framework
upload_time2023-10-27 20:59:12
maintainerNatural Capital Project Software Team
docs_urlNone
author
requires_python>=3.6
licenseIn this license, "Natural Capital Project" is defined as the parties of Stanford University, The Nature Conservancy, World Wildlife Fund Inc., and University of Minnesota. This tool has an open license. All people are invited to use the tool under the following conditions and terms: Copyright (c) 2020, Natural Capital Project All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Natural Capital Project nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
keywords parallel multiprocessing distributed computing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ===============
About TaskGraph
===============

``TaskGraph`` is a library that was developed to help manage complicated
computational software pipelines consisting of long running individual tasks.
Many of these tasks could be executed in parallel, almost all of them wrote
results to disk, and many times results could be reused from part of the
pipeline. TaskGraph manages all of this for you. With it you can schedule
tasks with dependencies, avoid recomputing results that have already been
computed, and allot multiple CPU cores to execute tasks in parallel if
desired.

TaskGraph Dependencies
----------------------

Task Graph is written in pure Python, but if the ``psutils`` package is
installed the distributed multiprocessing processes will be ``nice``\d.

Example Use
-----------

Install ``TaskGraph`` with

``pip install taskgraph``

Then

.. code-block:: python

  import os
  import pickle
  import logging

  import taskgraph

  logging.basicConfig(level=logging.DEBUG)

  def _create_list_on_disk(value, length, target_path):
      """Create a numpy array on disk filled with value of `size`."""
      target_list = [value] * length
      pickle.dump(target_list, open(target_path, 'wb'))


  def _sum_lists_from_disk(list_a_path, list_b_path, target_path):
      """Read two lists, add them and save result."""
      list_a = pickle.load(open(list_a_path, 'rb'))
      list_b = pickle.load(open(list_b_path, 'rb'))
      target_list = []
      for a, b in zip(list_a, list_b):
          target_list.append(a+b)
      pickle.dump(target_list, open(target_path, 'wb'))

  # create a taskgraph that uses 4 multiprocessing subprocesses when possible
  if __name__ == '__main__':
      workspace_dir = 'workspace'
      task_graph = taskgraph.TaskGraph(workspace_dir, 4)
      target_a_path = os.path.join(workspace_dir, 'a.dat')
      target_b_path = os.path.join(workspace_dir, 'b.dat')
      result_path = os.path.join(workspace_dir, 'result.dat')
      result_2_path = os.path.join(workspace_dir, 'result2.dat')
      value_a = 5
      value_b = 10
      list_len = 10
      task_a = task_graph.add_task(
          func=_create_list_on_disk,
          args=(value_a, list_len, target_a_path),
          target_path_list=[target_a_path])
      task_b = task_graph.add_task(
          func=_create_list_on_disk,
          args=(value_b, list_len, target_b_path),
          target_path_list=[target_b_path])
      sum_task = task_graph.add_task(
          func=_sum_lists_from_disk,
          args=(target_a_path, target_b_path, result_path),
          target_path_list=[result_path],
          dependent_task_list=[task_a, task_b])

      task_graph.close()
      task_graph.join()

      # expect that result is a list `list_len` long with `value_a+value_b` in it
      result = pickle.load(open(result_path, 'rb'))


Caveats
-------

* Taskgraph's default method of checking whether a file has changed
  (``hash_algorithm='sizetimestamp'``) uses the filesystem's modification
  timestamp, interpreted in integer nanoseconds.  This check is only as
  accurate as the filesystem's timestamp.  For example:

  * FAT and FAT32 timestamps have a 2-second modification timestamp resolution
  * exFAT has a 10 millisecond timestamp resolution
  * NTFS has a 100 nanosecond timestamp resolution
  * HFS+ has a 1 second timestamp resolution
  * APFS has a 1 nanosecond timestamp resolution
  * ext3 has a 1 second timestamp resolution
  * ext4 has a 1 nanosecond timestamp resolution

  If you suspect timestamp resolution to be an issue on your filesystem, you
  may wish to store your files on a filesystem with more accurate timestamps or
  else consider using a different ``hash_algorithm``.


Running Tests
-------------

Taskgraph includes a ``tox`` configuration for automating builds across
multiple python versions and whether ``psutil`` is installed.  To execute all
tests on all platforms, run:

    $ tox

Alternatively, if you're only trying to run tests on a single configuration
(say, python 3.7 without ``psutil``), you'd run::

    $ tox -e py37

Or if you'd like to run the tests for the combination of Python 3.7 with
``psutil``, you'd run::

    $ tox -e py37-psutil

If you don't have multiple python installations already available on your system,
an easy way to accomplish this is to use ``tox-conda``
(https://github.com/tox-dev/tox-conda) which will use conda environments to manage
the versions of python available::

    $ pip install tox-conda
    $ tox

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/natcap/taskgraph",
    "name": "taskgraph",
    "maintainer": "Natural Capital Project Software Team",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "jamesdouglassusa@gmail.com",
    "keywords": "parallel,multiprocessing,distributed,computing",
    "author": "",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/1f/a6/0e8b2eaaf5f2d307e60a93b75f7df586ae59fa44a8428b690a06678cf28e/taskgraph-0.11.1.tar.gz",
    "platform": null,
    "description": "===============\nAbout TaskGraph\n===============\n\n``TaskGraph`` is a library that was developed to help manage complicated\ncomputational software pipelines consisting of long running individual tasks.\nMany of these tasks could be executed in parallel, almost all of them wrote\nresults to disk, and many times results could be reused from part of the\npipeline. TaskGraph manages all of this for you. With it you can schedule\ntasks with dependencies, avoid recomputing results that have already been\ncomputed, and allot multiple CPU cores to execute tasks in parallel if\ndesired.\n\nTaskGraph Dependencies\n----------------------\n\nTask Graph is written in pure Python, but if the ``psutils`` package is\ninstalled the distributed multiprocessing processes will be ``nice``\\d.\n\nExample Use\n-----------\n\nInstall ``TaskGraph`` with\n\n``pip install taskgraph``\n\nThen\n\n.. code-block:: python\n\n  import os\n  import pickle\n  import logging\n\n  import taskgraph\n\n  logging.basicConfig(level=logging.DEBUG)\n\n  def _create_list_on_disk(value, length, target_path):\n      \"\"\"Create a numpy array on disk filled with value of `size`.\"\"\"\n      target_list = [value] * length\n      pickle.dump(target_list, open(target_path, 'wb'))\n\n\n  def _sum_lists_from_disk(list_a_path, list_b_path, target_path):\n      \"\"\"Read two lists, add them and save result.\"\"\"\n      list_a = pickle.load(open(list_a_path, 'rb'))\n      list_b = pickle.load(open(list_b_path, 'rb'))\n      target_list = []\n      for a, b in zip(list_a, list_b):\n          target_list.append(a+b)\n      pickle.dump(target_list, open(target_path, 'wb'))\n\n  # create a taskgraph that uses 4 multiprocessing subprocesses when possible\n  if __name__ == '__main__':\n      workspace_dir = 'workspace'\n      task_graph = taskgraph.TaskGraph(workspace_dir, 4)\n      target_a_path = os.path.join(workspace_dir, 'a.dat')\n      target_b_path = os.path.join(workspace_dir, 'b.dat')\n      result_path = os.path.join(workspace_dir, 'result.dat')\n      result_2_path = os.path.join(workspace_dir, 'result2.dat')\n      value_a = 5\n      value_b = 10\n      list_len = 10\n      task_a = task_graph.add_task(\n          func=_create_list_on_disk,\n          args=(value_a, list_len, target_a_path),\n          target_path_list=[target_a_path])\n      task_b = task_graph.add_task(\n          func=_create_list_on_disk,\n          args=(value_b, list_len, target_b_path),\n          target_path_list=[target_b_path])\n      sum_task = task_graph.add_task(\n          func=_sum_lists_from_disk,\n          args=(target_a_path, target_b_path, result_path),\n          target_path_list=[result_path],\n          dependent_task_list=[task_a, task_b])\n\n      task_graph.close()\n      task_graph.join()\n\n      # expect that result is a list `list_len` long with `value_a+value_b` in it\n      result = pickle.load(open(result_path, 'rb'))\n\n\nCaveats\n-------\n\n* Taskgraph's default method of checking whether a file has changed\n  (``hash_algorithm='sizetimestamp'``) uses the filesystem's modification\n  timestamp, interpreted in integer nanoseconds.  This check is only as\n  accurate as the filesystem's timestamp.  For example:\n\n  * FAT and FAT32 timestamps have a 2-second modification timestamp resolution\n  * exFAT has a 10 millisecond timestamp resolution\n  * NTFS has a 100 nanosecond timestamp resolution\n  * HFS+ has a 1 second timestamp resolution\n  * APFS has a 1 nanosecond timestamp resolution\n  * ext3 has a 1 second timestamp resolution\n  * ext4 has a 1 nanosecond timestamp resolution\n\n  If you suspect timestamp resolution to be an issue on your filesystem, you\n  may wish to store your files on a filesystem with more accurate timestamps or\n  else consider using a different ``hash_algorithm``.\n\n\nRunning Tests\n-------------\n\nTaskgraph includes a ``tox`` configuration for automating builds across\nmultiple python versions and whether ``psutil`` is installed.  To execute all\ntests on all platforms, run:\n\n    $ tox\n\nAlternatively, if you're only trying to run tests on a single configuration\n(say, python 3.7 without ``psutil``), you'd run::\n\n    $ tox -e py37\n\nOr if you'd like to run the tests for the combination of Python 3.7 with\n``psutil``, you'd run::\n\n    $ tox -e py37-psutil\n\nIf you don't have multiple python installations already available on your system,\nan easy way to accomplish this is to use ``tox-conda``\n(https://github.com/tox-dev/tox-conda) which will use conda environments to manage\nthe versions of python available::\n\n    $ pip install tox-conda\n    $ tox\n",
    "bugtrack_url": null,
    "license": "In this license, \"Natural Capital Project\" is defined as the parties of Stanford University, The Nature Conservancy, World Wildlife Fund Inc., and University of Minnesota.  This tool has an open license. All people are invited to use the tool under the following conditions and terms:  Copyright (c) 2020, Natural Capital Project  All rights reserved.  Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:  * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.  * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.  * Neither the name of Natural Capital Project nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ",
    "summary": "Parallel task graph framework",
    "version": "0.11.1",
    "project_urls": {
        "Homepage": "https://github.com/natcap/taskgraph"
    },
    "split_keywords": [
        "parallel",
        "multiprocessing",
        "distributed",
        "computing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "27c4b2b88d64a6b369fd9869725e32ee45770741c4e93a8365915918a20dfaeb",
                "md5": "0c4aa094ff6f6b176781989c0b8b0780",
                "sha256": "32f4c98f89d06a210ab473d14c03fd807543c469e2b6ac191376d4b617ff675c"
            },
            "downloads": -1,
            "filename": "taskgraph-0.11.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0c4aa094ff6f6b176781989c0b8b0780",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 22965,
            "upload_time": "2023-10-27T20:59:10",
            "upload_time_iso_8601": "2023-10-27T20:59:10.413532Z",
            "url": "https://files.pythonhosted.org/packages/27/c4/b2b88d64a6b369fd9869725e32ee45770741c4e93a8365915918a20dfaeb/taskgraph-0.11.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1fa60e8b2eaaf5f2d307e60a93b75f7df586ae59fa44a8428b690a06678cf28e",
                "md5": "dd003c8c598f631eb3a5f367605a791b",
                "sha256": "536cf4fc4dde6ae9f953363b52917f3eb961313178053694a154d872b5f3fc3d"
            },
            "downloads": -1,
            "filename": "taskgraph-0.11.1.tar.gz",
            "has_sig": false,
            "md5_digest": "dd003c8c598f631eb3a5f367605a791b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 42447,
            "upload_time": "2023-10-27T20:59:12",
            "upload_time_iso_8601": "2023-10-27T20:59:12.332142Z",
            "url": "https://files.pythonhosted.org/packages/1f/a6/0e8b2eaaf5f2d307e60a93b75f7df586ae59fa44a8428b690a06678cf28e/taskgraph-0.11.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-27 20:59:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "natcap",
    "github_project": "taskgraph",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "taskgraph"
}
        
Elapsed time: 0.14735s