Name | yawns JSON |
Version |
0.3.1
JSON |
| download |
home_page | None |
Summary | Yet Another Workflow Engine, a subprocess-based DAG execution system |
upload_time | 2025-07-16 23:26:36 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.6 |
license | None |
keywords |
task
execution
subprocess
dag
workflow
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
YAWN: Yet Another Workflow Engine
=================================
YAWN provides a framework for executing a set of shell commands with dependencies
in a distributed manner and on a repeating schedule. Other tools do similar things and
are inspirations for this one; particularly Celery_ and Airflow_.
Browse it live at https://yawn.live, deployed on GKE_.
.. _Celery: http://www.celeryproject.org/
.. _Airflow: https://airflow.incubator.apache.org/
.. _GKE: https://github.com/aclowes/yawn-gke
.. image:: https://circleci.com/gh/aclowes/yawn/tree/master.svg?style=svg
:target: https://circleci.com/gh/aclowes/yawn/tree/master
.. image:: https://codecov.io/gh/aclowes/yawn/branch/master/graph/badge.svg
:target: https://codecov.io/gh/aclowes/yawn
Principle Differences
---------------------
YAWN is inspired by, but different from Celery and Airflow because it:
* Runs each task in a separate subprocess, like Airflow but unlike Celery, which avoids polution of
a shared python interpreter and makes memory usage easier to reason about.
* Uses PostgreSQL as the message broker and database, alleviating the need for a separate broker like
Redis or RabbitMQ. This avoids the `visibility timeout`_ issue when using Redis as a Celery broker.
YAWN uses the new ``SELECT ... FOR UPDATE SKIP LOCKED`` statement to efficiently select from the
queue table.
.. _visibility timeout: http://docs.celeryproject.org/en/latest/getting-started/brokers/redis.html#id1
* Stores the command, environment variables, stdout and stderror for each task execution,
so its easier to see the logs and history of what happened. Re-running a task does not overwrite
the previous run.
* Does not support inputs or outputs other than the command line and environment variables, with the
intention that client applications should handle state instead.
Components
----------
Web Server
The website provides a user interface to view the workflows and tasks running within them.
It allows you to run an existing workflow or re-run a failed task. The web server also provides
a REST API to remotely create and run workflows.
Worker
The worker schedules and executes tasks. The worker uses ``subprocess.Popen`` to run tasks and
capture stdout and stderr.
Concepts
--------
Workflow
A set of Tasks that can depend on each other, forming what is popularly known as a directed
acyclic graph (DAG). Workflows can be scheduled to run on a regular basis and they are versioned
so they can change over time.
Run
An instance of a workflow, manually triggered or scheduled.
Task
A shell command that specifies the upstream tasks it depends on, the number times to retry, and a
timeout. The task is given environment variables configured in the workflow and run.
Execution
A single execution of a Task's command, capturing the exit code and standard output and error.
Queue
A first-in, first-out list of Tasks to execute.
Worker
A process that reads from a set of Queues and executes the associated Tasks, recording the
results in an Execution.
Installation
------------
To get started using YAWN::
# install the package - someone has yawn :-(
pip install yawns
# install postgres and create the yawn database
# the default settings localhost and no password
createdb yawn
# setup the tables by running db migrations
yawn migrate
# create a user to login with
yawn createsuperuser
# create some sample workflows
yawn examples
# start the webserver on port 8000
yawn webserver
# open a new terminal and start the worker
yawn worker
Here is a screenshot of the page for a single workflow:
.. image:: https://cloud.githubusercontent.com/assets/910316/21969288/fe40baf0-db51-11e6-97f2-7e6875c1e575.png
REST API
--------
Browse the API by going to http://127.0.0.1:8000/api/ in a browser.
When creating a workflow, the format is (shown as YAML for readability)::
name: Example
parameters:
ENVIRONMENT: production
CALCULATION_DATE: 2017-01-01
schedule: 0 0 *
schedule_active: True
tasks:
- name: task_1
queue: default
max_retries: 1
timeout: 30
command: python my_awesome_program.py $ENVIRONMENT
- name: task_2
queue: default
command: echo $CALCULATION_DATE | grep 2017
upstream:
- task_1
``/api/workflows/``
GET a list of versions or a single workflow version. POST to create or update a workflow
using the schema show above. PATCH to change the ``schedule``, ``schedule_active``, or
``parameters`` fields only.
* POST - use the schema shown above
* PATCH ``{"schedule_active": false}``
``/api/runs/``
GET a list of runs, optionally filtering to a particular workflow using ``?workflow=<id>``.
POST to create a new run. PATCH to change the parameters.
* POST ``{"workflow_id": 1, "parameters": null}``
* PATCH ``{"parameters": {"ENVIRONMENT": "test"}}``
``/api/tasks/<id>/``
GET a single task from a workflow run, and its executions with their status and logging
information. PATCH to enqueue a task or kill a running execution.
* PATCH ``{"enqueue": true}``
* PATCH ``{"terminate": <execution_id>}``
Python API
----------
Import and use the Django models to create your workflow::
from yawn.workflow.models import WorkflowName
from yawn.task.models import Template
name, _ = WorkflowName.objects.get_or_create(name='Simple Workflow Example')
workflow = name.new_version(parameters={'MY_OBJECT_ID': '1', 'SOME_SETTING': 'false'})
task1 = Template.objects.create(workflow=workflow, name='start', command='echo Starting...')
task2 = Template.objects.create(workflow=workflow, name='task2', command='echo Working on $MY_OBJECT_ID')
task2.upstream.add(task1)
task3 = Template.objects.create(workflow=workflow, name='task3',
command='echo Another busy thing && sleep 20')
task3.upstream.add(task1)
task4 = Template.objects.create(workflow=workflow, name='done', command='echo Finished!')
task4.upstream.add(task2, task3)
workflow.submit_run(parameters={'child': 'true'})
Alternatively, use the serializer to give tasks as a dictionary in the format used
by the API. This method checks if a version of the Workflow exists with the same structure,
and will return the existing version if so::
from yawn.workflow.serializers import WorkflowSerializer
serializer = WorkflowSerializer(data=test_views.data())
serializer.is_valid(raise_exception=True)
workflow = serializer.save()
workflow.submit_run()
Links
-----
* Contributing_
* License_
* `Deploying YAWN on Kubernetes via Google Container Engine`_
.. _Contributing: CONTRIBUTING.rst
.. _License: LICENSE.txt
.. _Deploying YAWN on Kubernetes via Google Container Engine: https://github.com/aclowes/yawn-gke
Raw data
{
"_id": null,
"home_page": null,
"name": "yawns",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "Alec Clowes <aclowes@gmail.com>",
"keywords": "task, execution, subprocess, dag, workflow",
"author": null,
"author_email": "Alec Clowes <aclowes@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/70/a6/486c47c0aba0ff4f3bbe21578ba4c1cea5d6b20dda94e010388acfe81759/yawns-0.3.1.tar.gz",
"platform": null,
"description": "YAWN: Yet Another Workflow Engine\n=================================\n\nYAWN provides a framework for executing a set of shell commands with dependencies\nin a distributed manner and on a repeating schedule. Other tools do similar things and\nare inspirations for this one; particularly Celery_ and Airflow_.\n\nBrowse it live at https://yawn.live, deployed on GKE_.\n\n.. _Celery: http://www.celeryproject.org/\n.. _Airflow: https://airflow.incubator.apache.org/\n.. _GKE: https://github.com/aclowes/yawn-gke\n\n.. image:: https://circleci.com/gh/aclowes/yawn/tree/master.svg?style=svg\n :target: https://circleci.com/gh/aclowes/yawn/tree/master\n.. image:: https://codecov.io/gh/aclowes/yawn/branch/master/graph/badge.svg\n :target: https://codecov.io/gh/aclowes/yawn\n\nPrinciple Differences\n---------------------\n\nYAWN is inspired by, but different from Celery and Airflow because it:\n\n* Runs each task in a separate subprocess, like Airflow but unlike Celery, which avoids polution of\n a shared python interpreter and makes memory usage easier to reason about.\n\n* Uses PostgreSQL as the message broker and database, alleviating the need for a separate broker like\n Redis or RabbitMQ. This avoids the `visibility timeout`_ issue when using Redis as a Celery broker.\n YAWN uses the new ``SELECT ... FOR UPDATE SKIP LOCKED`` statement to efficiently select from the\n queue table.\n\n.. _visibility timeout: http://docs.celeryproject.org/en/latest/getting-started/brokers/redis.html#id1\n\n* Stores the command, environment variables, stdout and stderror for each task execution,\n so its easier to see the logs and history of what happened. Re-running a task does not overwrite\n the previous run.\n\n* Does not support inputs or outputs other than the command line and environment variables, with the\n intention that client applications should handle state instead.\n\nComponents\n----------\n\nWeb Server\n The website provides a user interface to view the workflows and tasks running within them.\n It allows you to run an existing workflow or re-run a failed task. The web server also provides\n a REST API to remotely create and run workflows.\n\nWorker\n The worker schedules and executes tasks. The worker uses ``subprocess.Popen`` to run tasks and\n capture stdout and stderr.\n\nConcepts\n--------\n\nWorkflow\n A set of Tasks that can depend on each other, forming what is popularly known as a directed\n acyclic graph (DAG). Workflows can be scheduled to run on a regular basis and they are versioned\n so they can change over time.\n\nRun\n An instance of a workflow, manually triggered or scheduled.\n\nTask\n A shell command that specifies the upstream tasks it depends on, the number times to retry, and a\n timeout. The task is given environment variables configured in the workflow and run.\n\nExecution\n A single execution of a Task's command, capturing the exit code and standard output and error.\n\nQueue\n A first-in, first-out list of Tasks to execute.\n\nWorker\n A process that reads from a set of Queues and executes the associated Tasks, recording the\n results in an Execution.\n\nInstallation\n------------\n\nTo get started using YAWN::\n\n # install the package - someone has yawn :-(\n pip install yawns\n\n # install postgres and create the yawn database\n # the default settings localhost and no password\n createdb yawn\n\n # setup the tables by running db migrations\n yawn migrate\n\n # create a user to login with\n yawn createsuperuser\n\n # create some sample workflows\n yawn examples\n\n # start the webserver on port 8000\n yawn webserver\n\n # open a new terminal and start the worker\n yawn worker\n\nHere is a screenshot of the page for a single workflow:\n\n.. image:: https://cloud.githubusercontent.com/assets/910316/21969288/fe40baf0-db51-11e6-97f2-7e6875c1e575.png\n\nREST API\n--------\n\nBrowse the API by going to http://127.0.0.1:8000/api/ in a browser.\n\nWhen creating a workflow, the format is (shown as YAML for readability)::\n\n name: Example\n parameters:\n ENVIRONMENT: production\n CALCULATION_DATE: 2017-01-01\n schedule: 0 0 *\n schedule_active: True\n\n tasks:\n - name: task_1\n queue: default\n max_retries: 1\n timeout: 30\n command: python my_awesome_program.py $ENVIRONMENT\n - name: task_2\n queue: default\n command: echo $CALCULATION_DATE | grep 2017\n upstream:\n - task_1\n\n``/api/workflows/``\n GET a list of versions or a single workflow version. POST to create or update a workflow\n using the schema show above. PATCH to change the ``schedule``, ``schedule_active``, or\n ``parameters`` fields only.\n\n * POST - use the schema shown above\n * PATCH ``{\"schedule_active\": false}``\n\n``/api/runs/``\n GET a list of runs, optionally filtering to a particular workflow using ``?workflow=<id>``.\n POST to create a new run. PATCH to change the parameters.\n\n * POST ``{\"workflow_id\": 1, \"parameters\": null}``\n * PATCH ``{\"parameters\": {\"ENVIRONMENT\": \"test\"}}``\n\n``/api/tasks/<id>/``\n GET a single task from a workflow run, and its executions with their status and logging\n information. PATCH to enqueue a task or kill a running execution.\n\n * PATCH ``{\"enqueue\": true}``\n * PATCH ``{\"terminate\": <execution_id>}``\n\nPython API\n----------\n\nImport and use the Django models to create your workflow::\n\n from yawn.workflow.models import WorkflowName\n from yawn.task.models import Template\n\n name, _ = WorkflowName.objects.get_or_create(name='Simple Workflow Example')\n workflow = name.new_version(parameters={'MY_OBJECT_ID': '1', 'SOME_SETTING': 'false'})\n task1 = Template.objects.create(workflow=workflow, name='start', command='echo Starting...')\n task2 = Template.objects.create(workflow=workflow, name='task2', command='echo Working on $MY_OBJECT_ID')\n task2.upstream.add(task1)\n task3 = Template.objects.create(workflow=workflow, name='task3',\n command='echo Another busy thing && sleep 20')\n task3.upstream.add(task1)\n task4 = Template.objects.create(workflow=workflow, name='done', command='echo Finished!')\n task4.upstream.add(task2, task3)\n\n workflow.submit_run(parameters={'child': 'true'})\n\nAlternatively, use the serializer to give tasks as a dictionary in the format used\nby the API. This method checks if a version of the Workflow exists with the same structure,\nand will return the existing version if so::\n\n from yawn.workflow.serializers import WorkflowSerializer\n\n serializer = WorkflowSerializer(data=test_views.data())\n serializer.is_valid(raise_exception=True)\n workflow = serializer.save()\n workflow.submit_run()\n\nLinks\n-----\n\n* Contributing_\n* License_\n* `Deploying YAWN on Kubernetes via Google Container Engine`_\n\n.. _Contributing: CONTRIBUTING.rst\n.. _License: LICENSE.txt\n.. _Deploying YAWN on Kubernetes via Google Container Engine: https://github.com/aclowes/yawn-gke\n",
"bugtrack_url": null,
"license": null,
"summary": "Yet Another Workflow Engine, a subprocess-based DAG execution system",
"version": "0.3.1",
"project_urls": {
"Homepage": "https://github.com/aclowes/yawn"
},
"split_keywords": [
"task",
" execution",
" subprocess",
" dag",
" workflow"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "06662cbd9155db40ef63c70241e4f9cd26fe5174b939f16eb6a072f5391e500b",
"md5": "a0d7c37079e7e76014a45608e28d68c9",
"sha256": "d2af97f187915eb9f8eeedd80427957cc8ad0c5d6cba43d4b796ba1970c825de"
},
"downloads": -1,
"filename": "yawns-0.3.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a0d7c37079e7e76014a45608e28d68c9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 2808633,
"upload_time": "2025-07-16T23:26:33",
"upload_time_iso_8601": "2025-07-16T23:26:33.806660Z",
"url": "https://files.pythonhosted.org/packages/06/66/2cbd9155db40ef63c70241e4f9cd26fe5174b939f16eb6a072f5391e500b/yawns-0.3.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "70a6486c47c0aba0ff4f3bbe21578ba4c1cea5d6b20dda94e010388acfe81759",
"md5": "6dcbf438ed3910cb41485701817c4fe0",
"sha256": "d236bb13bf435d30359698cf0066e3387f1b66437ceb119973d861ab809b3fcf"
},
"downloads": -1,
"filename": "yawns-0.3.1.tar.gz",
"has_sig": false,
"md5_digest": "6dcbf438ed3910cb41485701817c4fe0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 2773154,
"upload_time": "2025-07-16T23:26:36",
"upload_time_iso_8601": "2025-07-16T23:26:36.806192Z",
"url": "https://files.pythonhosted.org/packages/70/a6/486c47c0aba0ff4f3bbe21578ba4c1cea5d6b20dda94e010388acfe81759/yawns-0.3.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-16 23:26:36",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "aclowes",
"github_project": "yawn",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"circle": true,
"lcname": "yawns"
}