yawns


Nameyawns JSON
Version 0.3.1 PyPI version JSON
download
home_pageNone
SummaryYet Another Workflow Engine, a subprocess-based DAG execution system
upload_time2025-07-16 23:26:36
maintainerNone
docs_urlNone
authorNone
requires_python>=3.6
licenseNone
keywords task execution subprocess dag workflow
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            YAWN: Yet Another Workflow Engine
=================================

YAWN provides a framework for executing a set of shell commands with dependencies
in a distributed manner and on a repeating schedule. Other tools do similar things and
are inspirations for this one; particularly Celery_ and Airflow_.

Browse it live at https://yawn.live, deployed on GKE_.

.. _Celery: http://www.celeryproject.org/
.. _Airflow: https://airflow.incubator.apache.org/
.. _GKE: https://github.com/aclowes/yawn-gke

.. image:: https://circleci.com/gh/aclowes/yawn/tree/master.svg?style=svg
  :target: https://circleci.com/gh/aclowes/yawn/tree/master
.. image:: https://codecov.io/gh/aclowes/yawn/branch/master/graph/badge.svg
  :target: https://codecov.io/gh/aclowes/yawn

Principle Differences
---------------------

YAWN is inspired by, but different from Celery and Airflow because it:

* Runs each task in a separate subprocess, like Airflow but unlike Celery, which avoids polution of
  a shared python interpreter and makes memory usage easier to reason about.

* Uses PostgreSQL as the message broker and database, alleviating the need for a separate broker like
  Redis or RabbitMQ. This avoids the `visibility timeout`_ issue when using Redis as a Celery broker.
  YAWN uses the new ``SELECT ... FOR UPDATE SKIP LOCKED`` statement to efficiently select from the
  queue table.

.. _visibility timeout: http://docs.celeryproject.org/en/latest/getting-started/brokers/redis.html#id1

* Stores the command, environment variables, stdout and stderror for each task execution,
  so its easier to see the logs and history of what happened. Re-running a task does not overwrite
  the previous run.

* Does not support inputs or outputs other than the command line and environment variables, with the
  intention that client applications should handle state instead.

Components
----------

Web Server
  The website provides a user interface to view the workflows and tasks running within them.
  It allows you to run an existing workflow or re-run a failed task. The web server also provides
  a REST API to remotely create and run workflows.

Worker
  The worker schedules and executes tasks. The worker uses ``subprocess.Popen`` to run tasks and
  capture stdout and stderr.

Concepts
--------

Workflow
  A set of Tasks that can depend on each other, forming what is popularly known as a directed
  acyclic graph (DAG). Workflows can be scheduled to run on a regular basis and they are versioned
  so they can change over time.

Run
  An instance of a workflow, manually triggered or scheduled.

Task
  A shell command that specifies the upstream tasks it depends on, the number times to retry, and a
  timeout. The task is given environment variables configured in the workflow and run.

Execution
  A single execution of a Task's command, capturing the exit code and standard output and error.

Queue
  A first-in, first-out list of Tasks to execute.

Worker
  A process that reads from a set of Queues and executes the associated Tasks, recording the
  results in an Execution.

Installation
------------

To get started using YAWN::

    # install the package - someone has yawn :-(
    pip install yawns

    # install postgres and create the yawn database
    # the default settings localhost and no password
    createdb yawn

    # setup the tables by running db migrations
    yawn migrate

    # create a user to login with
    yawn createsuperuser

    # create some sample workflows
    yawn examples

    # start the webserver on port 8000
    yawn webserver

    # open a new terminal and start the worker
    yawn worker

Here is a screenshot of the page for a single workflow:

.. image:: https://cloud.githubusercontent.com/assets/910316/21969288/fe40baf0-db51-11e6-97f2-7e6875c1e575.png

REST API
--------

Browse the API by going to http://127.0.0.1:8000/api/ in a browser.

When creating a workflow, the format is (shown as YAML for readability)::

    name: Example
    parameters:
      ENVIRONMENT: production
      CALCULATION_DATE: 2017-01-01
    schedule: 0 0 *
    schedule_active: True

    tasks:
    - name: task_1
      queue: default
      max_retries: 1
      timeout: 30
      command: python my_awesome_program.py $ENVIRONMENT
    - name: task_2
      queue: default
      command: echo $CALCULATION_DATE | grep 2017
      upstream:
      - task_1

``/api/workflows/``
  GET a list of versions or a single workflow version. POST to create or update a workflow
  using the schema show above. PATCH to change the ``schedule``, ``schedule_active``, or
  ``parameters`` fields only.

  * POST - use the schema shown above
  * PATCH ``{"schedule_active": false}``

``/api/runs/``
  GET a list of runs, optionally filtering to a particular workflow using ``?workflow=<id>``.
  POST to create a new run. PATCH to change the parameters.

  * POST ``{"workflow_id": 1, "parameters": null}``
  * PATCH ``{"parameters": {"ENVIRONMENT": "test"}}``

``/api/tasks/<id>/``
  GET a single task from a workflow run, and its executions with their status and logging
  information. PATCH to enqueue a task or kill a running execution.

  * PATCH ``{"enqueue": true}``
  * PATCH ``{"terminate": <execution_id>}``

Python API
----------

Import and use the Django models to create your workflow::

    from yawn.workflow.models import WorkflowName
    from yawn.task.models import Template

    name, _ = WorkflowName.objects.get_or_create(name='Simple Workflow Example')
    workflow = name.new_version(parameters={'MY_OBJECT_ID': '1', 'SOME_SETTING': 'false'})
    task1 = Template.objects.create(workflow=workflow, name='start', command='echo Starting...')
    task2 = Template.objects.create(workflow=workflow, name='task2', command='echo Working on $MY_OBJECT_ID')
    task2.upstream.add(task1)
    task3 = Template.objects.create(workflow=workflow, name='task3',
                                    command='echo Another busy thing && sleep 20')
    task3.upstream.add(task1)
    task4 = Template.objects.create(workflow=workflow, name='done', command='echo Finished!')
    task4.upstream.add(task2, task3)

    workflow.submit_run(parameters={'child': 'true'})

Alternatively, use the serializer to give tasks as a dictionary in the format used
by the API. This method checks if a version of the Workflow exists with the same structure,
and will return the existing version if so::

    from yawn.workflow.serializers import WorkflowSerializer

    serializer = WorkflowSerializer(data=test_views.data())
    serializer.is_valid(raise_exception=True)
    workflow = serializer.save()
    workflow.submit_run()

Links
-----

* Contributing_
* License_
* `Deploying YAWN on Kubernetes via Google Container Engine`_

.. _Contributing: CONTRIBUTING.rst
.. _License: LICENSE.txt
.. _Deploying YAWN on Kubernetes via Google Container Engine: https://github.com/aclowes/yawn-gke

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "yawns",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "Alec Clowes <aclowes@gmail.com>",
    "keywords": "task, execution, subprocess, dag, workflow",
    "author": null,
    "author_email": "Alec Clowes <aclowes@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/70/a6/486c47c0aba0ff4f3bbe21578ba4c1cea5d6b20dda94e010388acfe81759/yawns-0.3.1.tar.gz",
    "platform": null,
    "description": "YAWN: Yet Another Workflow Engine\n=================================\n\nYAWN provides a framework for executing a set of shell commands with dependencies\nin a distributed manner and on a repeating schedule. Other tools do similar things and\nare inspirations for this one; particularly Celery_ and Airflow_.\n\nBrowse it live at https://yawn.live, deployed on GKE_.\n\n.. _Celery: http://www.celeryproject.org/\n.. _Airflow: https://airflow.incubator.apache.org/\n.. _GKE: https://github.com/aclowes/yawn-gke\n\n.. image:: https://circleci.com/gh/aclowes/yawn/tree/master.svg?style=svg\n  :target: https://circleci.com/gh/aclowes/yawn/tree/master\n.. image:: https://codecov.io/gh/aclowes/yawn/branch/master/graph/badge.svg\n  :target: https://codecov.io/gh/aclowes/yawn\n\nPrinciple Differences\n---------------------\n\nYAWN is inspired by, but different from Celery and Airflow because it:\n\n* Runs each task in a separate subprocess, like Airflow but unlike Celery, which avoids polution of\n  a shared python interpreter and makes memory usage easier to reason about.\n\n* Uses PostgreSQL as the message broker and database, alleviating the need for a separate broker like\n  Redis or RabbitMQ. This avoids the `visibility timeout`_ issue when using Redis as a Celery broker.\n  YAWN uses the new ``SELECT ... FOR UPDATE SKIP LOCKED`` statement to efficiently select from the\n  queue table.\n\n.. _visibility timeout: http://docs.celeryproject.org/en/latest/getting-started/brokers/redis.html#id1\n\n* Stores the command, environment variables, stdout and stderror for each task execution,\n  so its easier to see the logs and history of what happened. Re-running a task does not overwrite\n  the previous run.\n\n* Does not support inputs or outputs other than the command line and environment variables, with the\n  intention that client applications should handle state instead.\n\nComponents\n----------\n\nWeb Server\n  The website provides a user interface to view the workflows and tasks running within them.\n  It allows you to run an existing workflow or re-run a failed task. The web server also provides\n  a REST API to remotely create and run workflows.\n\nWorker\n  The worker schedules and executes tasks. The worker uses ``subprocess.Popen`` to run tasks and\n  capture stdout and stderr.\n\nConcepts\n--------\n\nWorkflow\n  A set of Tasks that can depend on each other, forming what is popularly known as a directed\n  acyclic graph (DAG). Workflows can be scheduled to run on a regular basis and they are versioned\n  so they can change over time.\n\nRun\n  An instance of a workflow, manually triggered or scheduled.\n\nTask\n  A shell command that specifies the upstream tasks it depends on, the number times to retry, and a\n  timeout. The task is given environment variables configured in the workflow and run.\n\nExecution\n  A single execution of a Task's command, capturing the exit code and standard output and error.\n\nQueue\n  A first-in, first-out list of Tasks to execute.\n\nWorker\n  A process that reads from a set of Queues and executes the associated Tasks, recording the\n  results in an Execution.\n\nInstallation\n------------\n\nTo get started using YAWN::\n\n    # install the package - someone has yawn :-(\n    pip install yawns\n\n    # install postgres and create the yawn database\n    # the default settings localhost and no password\n    createdb yawn\n\n    # setup the tables by running db migrations\n    yawn migrate\n\n    # create a user to login with\n    yawn createsuperuser\n\n    # create some sample workflows\n    yawn examples\n\n    # start the webserver on port 8000\n    yawn webserver\n\n    # open a new terminal and start the worker\n    yawn worker\n\nHere is a screenshot of the page for a single workflow:\n\n.. image:: https://cloud.githubusercontent.com/assets/910316/21969288/fe40baf0-db51-11e6-97f2-7e6875c1e575.png\n\nREST API\n--------\n\nBrowse the API by going to http://127.0.0.1:8000/api/ in a browser.\n\nWhen creating a workflow, the format is (shown as YAML for readability)::\n\n    name: Example\n    parameters:\n      ENVIRONMENT: production\n      CALCULATION_DATE: 2017-01-01\n    schedule: 0 0 *\n    schedule_active: True\n\n    tasks:\n    - name: task_1\n      queue: default\n      max_retries: 1\n      timeout: 30\n      command: python my_awesome_program.py $ENVIRONMENT\n    - name: task_2\n      queue: default\n      command: echo $CALCULATION_DATE | grep 2017\n      upstream:\n      - task_1\n\n``/api/workflows/``\n  GET a list of versions or a single workflow version. POST to create or update a workflow\n  using the schema show above. PATCH to change the ``schedule``, ``schedule_active``, or\n  ``parameters`` fields only.\n\n  * POST - use the schema shown above\n  * PATCH ``{\"schedule_active\": false}``\n\n``/api/runs/``\n  GET a list of runs, optionally filtering to a particular workflow using ``?workflow=<id>``.\n  POST to create a new run. PATCH to change the parameters.\n\n  * POST ``{\"workflow_id\": 1, \"parameters\": null}``\n  * PATCH ``{\"parameters\": {\"ENVIRONMENT\": \"test\"}}``\n\n``/api/tasks/<id>/``\n  GET a single task from a workflow run, and its executions with their status and logging\n  information. PATCH to enqueue a task or kill a running execution.\n\n  * PATCH ``{\"enqueue\": true}``\n  * PATCH ``{\"terminate\": <execution_id>}``\n\nPython API\n----------\n\nImport and use the Django models to create your workflow::\n\n    from yawn.workflow.models import WorkflowName\n    from yawn.task.models import Template\n\n    name, _ = WorkflowName.objects.get_or_create(name='Simple Workflow Example')\n    workflow = name.new_version(parameters={'MY_OBJECT_ID': '1', 'SOME_SETTING': 'false'})\n    task1 = Template.objects.create(workflow=workflow, name='start', command='echo Starting...')\n    task2 = Template.objects.create(workflow=workflow, name='task2', command='echo Working on $MY_OBJECT_ID')\n    task2.upstream.add(task1)\n    task3 = Template.objects.create(workflow=workflow, name='task3',\n                                    command='echo Another busy thing && sleep 20')\n    task3.upstream.add(task1)\n    task4 = Template.objects.create(workflow=workflow, name='done', command='echo Finished!')\n    task4.upstream.add(task2, task3)\n\n    workflow.submit_run(parameters={'child': 'true'})\n\nAlternatively, use the serializer to give tasks as a dictionary in the format used\nby the API. This method checks if a version of the Workflow exists with the same structure,\nand will return the existing version if so::\n\n    from yawn.workflow.serializers import WorkflowSerializer\n\n    serializer = WorkflowSerializer(data=test_views.data())\n    serializer.is_valid(raise_exception=True)\n    workflow = serializer.save()\n    workflow.submit_run()\n\nLinks\n-----\n\n* Contributing_\n* License_\n* `Deploying YAWN on Kubernetes via Google Container Engine`_\n\n.. _Contributing: CONTRIBUTING.rst\n.. _License: LICENSE.txt\n.. _Deploying YAWN on Kubernetes via Google Container Engine: https://github.com/aclowes/yawn-gke\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Yet Another Workflow Engine, a subprocess-based DAG execution system",
    "version": "0.3.1",
    "project_urls": {
        "Homepage": "https://github.com/aclowes/yawn"
    },
    "split_keywords": [
        "task",
        " execution",
        " subprocess",
        " dag",
        " workflow"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "06662cbd9155db40ef63c70241e4f9cd26fe5174b939f16eb6a072f5391e500b",
                "md5": "a0d7c37079e7e76014a45608e28d68c9",
                "sha256": "d2af97f187915eb9f8eeedd80427957cc8ad0c5d6cba43d4b796ba1970c825de"
            },
            "downloads": -1,
            "filename": "yawns-0.3.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a0d7c37079e7e76014a45608e28d68c9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 2808633,
            "upload_time": "2025-07-16T23:26:33",
            "upload_time_iso_8601": "2025-07-16T23:26:33.806660Z",
            "url": "https://files.pythonhosted.org/packages/06/66/2cbd9155db40ef63c70241e4f9cd26fe5174b939f16eb6a072f5391e500b/yawns-0.3.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "70a6486c47c0aba0ff4f3bbe21578ba4c1cea5d6b20dda94e010388acfe81759",
                "md5": "6dcbf438ed3910cb41485701817c4fe0",
                "sha256": "d236bb13bf435d30359698cf0066e3387f1b66437ceb119973d861ab809b3fcf"
            },
            "downloads": -1,
            "filename": "yawns-0.3.1.tar.gz",
            "has_sig": false,
            "md5_digest": "6dcbf438ed3910cb41485701817c4fe0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 2773154,
            "upload_time": "2025-07-16T23:26:36",
            "upload_time_iso_8601": "2025-07-16T23:26:36.806192Z",
            "url": "https://files.pythonhosted.org/packages/70/a6/486c47c0aba0ff4f3bbe21578ba4c1cea5d6b20dda94e010388acfe81759/yawns-0.3.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-16 23:26:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "aclowes",
    "github_project": "yawn",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "circle": true,
    "lcname": "yawns"
}
        
Elapsed time: 0.49738s