zocalo


Namezocalo JSON
Version 1.2.0 PyPI version JSON
download
home_pageNone
SummaryInfrastructure components for automated data processing at Diamond Light Source
upload_time2024-11-14 13:31:56
maintainerNone
docs_urlNone
authorNicholas Devenish
requires_python<4.0,>=3.10
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ======
Zocalo
======


.. image:: https://img.shields.io/pypi/v/zocalo.svg
        :target: https://pypi.python.org/pypi/zocalo
        :alt: PyPI release

.. image:: https://img.shields.io/conda/vn/conda-forge/zocalo.svg
        :target: https://anaconda.org/conda-forge/zocalo
        :alt: Conda version

.. image:: https://dev.azure.com/zocalo/python-zocalo/_apis/build/status/DiamondLightSource.python-zocalo?branchName=main
        :target: https://dev.azure.com/zocalo/python-zocalo/_build/latest?definitionId=2&branchName=main
        :alt: Build status

.. image:: https://img.shields.io/lgtm/grade/python/g/DiamondLightSource/python-zocalo.svg?logo=lgtm&logoWidth=18
        :target: https://lgtm.com/projects/g/DiamondLightSource/python-zocalo/context:python
        :alt: Language grade: Python

.. image:: https://img.shields.io/lgtm/alerts/g/DiamondLightSource/python-zocalo.svg?logo=lgtm&logoWidth=18
        :target: https://lgtm.com/projects/g/DiamondLightSource/python-zocalo/alerts/
        :alt: Total alerts

.. image:: https://readthedocs.org/projects/zocalo/badge/?version=latest
        :target: https://zocalo.readthedocs.io/en/latest/?badge=latest
        :alt: Documentation status

.. image:: https://img.shields.io/pypi/pyversions/zocalo.svg
        :target: https://pypi.org/project/zocalo/
        :alt: Supported Python versions

.. image:: https://img.shields.io/badge/code%20style-black-000000.svg
        :target: https://github.com/ambv/black
        :alt: Code style: black

.. image:: https://img.shields.io/pypi/l/zocalo.svg
        :target: https://pypi.python.org/pypi/zocalo
        :alt: BSD license

..

        |
        | `M. Gerstel, A. Ashton, R.J. Gildea, K. Levik, and G. Winter, "Data Analysis Infrastructure for Diamond Light Source Macromolecular & Chemical Crystallography and Beyond", in Proc. ICALEPCS'19, New York, NY, USA, Oct. 2019, pp. 1031-1035. <https://doi.org/10.18429/JACoW-ICALEPCS2019-WEMPR001>`_ |DOI|

        .. |DOI| image:: https://img.shields.io/badge/DOI-10.18429/JACoW--ICALEPCS2019--WEMPR001-blue.svg
                :target: https://doi.org/10.18429/JACoW-ICALEPCS2019-WEMPR001
                :alt: Primary Reference DOI

|

Zocalo is an automated data processing system designed at Diamond Light Source. This repository contains infrastructure components for Zocalo.

The idea of Zocalo is a simple one - to build a messaging framework, where text-based messages are sent between parts of the system to coordinate data analysis. In the wider scope of things this also covers things like archiving, but generally it is handling everything that happens after data aquisition.

Zocalo as a wider whole is made up of two repositories (plus some private internal repositories when deployed at Diamond):

* `DiamondLightSource/python-zocalo <https://github.com/DiamondLightSource/python-zocalo>`_ - Infrastructure components for automated data processing, developed by Diamond Light Source. The package is available through `PyPi <https://pypi.org/project/zocalo/>`__ and `conda-forge <https://anaconda.org/conda-forge/zocalo>`__.
* `DiamondLightSource/python-workflows <https://github.com/DiamondLightSource/python-workflows/>`_ - Zocalo is built on the workflows package. It shouldn't be necessary to interact too much with this package, as the details are abstracted by Zocalo. workflows controls the logic of how services connect to each other and what a service is, and actually send the messages to a message broker. Currently this is an ActiveMQ_ broker (via STOMP_) but support for a RabbitMQ_ broker (via pika_) is being added. This is also available on `PyPi <https://pypi.org/project/workflows/>`__ and `conda-forge <https://anaconda.org/conda-forge/workflows>`__.

As mentioned, Zocalo is currently built on top of ActiveMQ. ActiveMQ is an apache project that provides a `message broker <https://en.wikipedia.org/wiki/Message_broker>`_ server, acting as a central dispatch that allows various services to communicate. Messages are plaintext, but from the Zocalo point of view it's passing aroung python objects (json dictionaries). Every message sent has a destination to help the message broker route. Messages may either be sent to a specific queue or broadcast to multiple queues. These queues are subscribed to by the services that run in Zocalo. In developing with Zocalo, you may have to interact with ActiveMQ or RabbitMQ, but it is unlikely that you will have to configure it.

Zocalo allows for the monitoring of jobs executing ``python-workflows`` services or recipe wrappers. The ``python-workflows`` package contains most of the infrastructure required for the jobs themselves and more detailed documentation of its components can be found in the ``python-workflows`` `GitHub repository <https://github.com/DiamondLightSource/python-workflows/>`_ and `the Zocalo documentation <https://zocalo.readthedocs.io>`_.

.. _ActiveMQ: http://activemq.apache.org/
.. _STOMP: https://stomp.github.io/
.. _RabbitMQ: https://www.rabbitmq.com/
.. _pika: https://github.com/pika/pika

Core Concepts
-------------

There are two kinds of task run in Zocalo: *services* and *wrappers*.
A service should handle a discrete short-lived task, for example a data processing job on a small data packet (e.g. finding spots on a single image in an X-ray crystallography context), or inserting results into a database.
In contrast, wrappers can be used for longer running tasks, for example running data processing programs such as xia2_ or fast_ep_.

* A **service** starts in the background and waits for work. There are many services constantly running as part of normal Zocalo operation. In typical usage at Diamond there are ~100 services running at a time.
* A **wrapper** on the other hand, is only run when needed. They wrap something that is not necessarily aware of Zocalo - e.g. downstream processing software such as xia2 have no idea what zocalo is, and shouldn't have to. A wrapper takes a message, converts to the instantiation of command line, runs the software - typically as a cluster job, then reformats the results into a message to send back to Zocalo. These processes have no idea what Zocalo is, but are being run by a script that handles the wrapping.

At Diamond, everything goes to one service to start with: the **Dispatcher**. This takes the initial request message and attaches useful information for the rest of Zocalo. The implementation of the Dispatcher at Diamond is environment specific and not public, but it does some things that would be useful for a similar service to do in other contexts. At Diamond there is interaction with the `ISPyB database <https://github.com/DiamondLightSource/ispyb-database>`_ that stores information about what is run, metadata, how many images, sample type etc. Data stored in the database influences what software we want to be running and this information might need to be read from the database in many, many services. We obviously don't want to read the same thing from many clients and flood the database, and don't want the database to be a single point of failure. The dispatcher front-loads all the database operations - it takes the data collection ID (DCID) and looks up in ISPyB all the information that could be needed for processing. In terms of movement through the system, it sits between the initial message and the services:

.. code:: bash

        message -> Dispatcher -> [Services]

At end of processing there might be information that needs to go back into the databases, for which Diamond has a special ISPyB service to do the writing. If the DB goes down, that is fine - things will queue up for the ISPyB service and get processed when the database becomes available again, and written to the database when ready. This isolates us somewhat from intermittent failures.

The only public Zocalo service at present is ``Schlockmeister``, a garbage collection service that removes jobs that have been requeued mutliple times. Diamond operates a variety of internal Zocalo services which perform frequently required operations in a data analysis pipeline.

.. _xia2: https://xia2.github.io/
.. _fast_ep: https://github.com/DiamondLightSource/fast_ep

Working with Zocalo
-------------------

`Graylog <https://www.graylog.org/>`_ is used to manage the logs produced by Zocalo. Once Graylog and the message broker server are running then services and wrappers can be launched with Zocalo.

Zocalo provides the following command line tools::
  * ``zocalo.go``: trigger the processing of a recipe
  * ``zocalo.wrap``: run a command while exposing its status to Zocalo so that it can be tracked
  * ``zocalo.service``: start a new instance of a service
  * ``zocalo.shutdown``: shutdown either specific instances of Zocalo services or all instances for a given type of service
  * ``zocalo.queue_drain``: drain one queue into another in a controlled manner

Services are available through ``zocalo.service`` if they are linked through the ``workflows.services`` entry point in ``setup.py``. For example, to start a Schlockmeister service:

.. code:: bash

        $ zocalo.service -s Schlockmeister

.. list-table::
        :widths: 100
        :header-rows: 1

        * - Q: How are services started?
        * - A: Zocalo itself is agnostic on this point. Some of the services are self-propagating and employ simple scaling behaviour - in particular the per-image-analysis services. The services in general all run on cluster nodes, although this means that they can not be long lived - beyond a couple of hours there is a high risk of the service cluster jobs being terminated or pre-empted. This also helps encourage programming more robust services if they could be killed.

.. list-table::
        :widths: 100
        :header-rows: 1

        * - Q: So if a service is terminated in the middle of processing it will still get processed?
        * - A: Yes, messages are handled in transactions - while a service is processing a message, it's marked as "in-progress" but isn't completely dropped. If the service doesn't process the message, or it's connection to ActiveMQ gets dropped, then it get's requeued so that another instance of the service can pick it up.

Repeat Message Failure
----------------------

How are repeat errors handled? This is a problem with the system - if e.g. an image or malformed message kills a service then it will get requeued, and will eventually kill all instances of the service running (which will get re-spawned, and then die, and so forth).

We have a special service that looks for repeat failures and moves them to a special "Dead Letter Queue". This service is called Schlockmeister_, and is the only service at time of writing that has migrated to the public zocalo repository. This service looks inside the message that got sent, extracts some basic information from the message in as safe a way as possible and repackages to the DLQ with information on what it was working on, and the "history" of where the message chain has been routed.

.. _Schlockmeister: https://github.com/DiamondLightSource/python-zocalo/tree/master/zocalo/service




            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "zocalo",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "Nicholas Devenish",
    "author_email": "ndevenish@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/a8/6d/464252883a03fb0fc83690b1292fc22c25705db37f54b7afb507e6594683/zocalo-1.2.0.tar.gz",
    "platform": null,
    "description": "======\nZocalo\n======\n\n\n.. image:: https://img.shields.io/pypi/v/zocalo.svg\n        :target: https://pypi.python.org/pypi/zocalo\n        :alt: PyPI release\n\n.. image:: https://img.shields.io/conda/vn/conda-forge/zocalo.svg\n        :target: https://anaconda.org/conda-forge/zocalo\n        :alt: Conda version\n\n.. image:: https://dev.azure.com/zocalo/python-zocalo/_apis/build/status/DiamondLightSource.python-zocalo?branchName=main\n        :target: https://dev.azure.com/zocalo/python-zocalo/_build/latest?definitionId=2&branchName=main\n        :alt: Build status\n\n.. image:: https://img.shields.io/lgtm/grade/python/g/DiamondLightSource/python-zocalo.svg?logo=lgtm&logoWidth=18\n        :target: https://lgtm.com/projects/g/DiamondLightSource/python-zocalo/context:python\n        :alt: Language grade: Python\n\n.. image:: https://img.shields.io/lgtm/alerts/g/DiamondLightSource/python-zocalo.svg?logo=lgtm&logoWidth=18\n        :target: https://lgtm.com/projects/g/DiamondLightSource/python-zocalo/alerts/\n        :alt: Total alerts\n\n.. image:: https://readthedocs.org/projects/zocalo/badge/?version=latest\n        :target: https://zocalo.readthedocs.io/en/latest/?badge=latest\n        :alt: Documentation status\n\n.. image:: https://img.shields.io/pypi/pyversions/zocalo.svg\n        :target: https://pypi.org/project/zocalo/\n        :alt: Supported Python versions\n\n.. image:: https://img.shields.io/badge/code%20style-black-000000.svg\n        :target: https://github.com/ambv/black\n        :alt: Code style: black\n\n.. image:: https://img.shields.io/pypi/l/zocalo.svg\n        :target: https://pypi.python.org/pypi/zocalo\n        :alt: BSD license\n\n..\n\n        |\n        | `M. Gerstel, A. Ashton, R.J. Gildea, K. Levik, and G. Winter, \"Data Analysis Infrastructure for Diamond Light Source Macromolecular & Chemical Crystallography and Beyond\", in Proc. ICALEPCS'19, New York, NY, USA, Oct. 2019, pp. 1031-1035. <https://doi.org/10.18429/JACoW-ICALEPCS2019-WEMPR001>`_ |DOI|\n\n        .. |DOI| image:: https://img.shields.io/badge/DOI-10.18429/JACoW--ICALEPCS2019--WEMPR001-blue.svg\n                :target: https://doi.org/10.18429/JACoW-ICALEPCS2019-WEMPR001\n                :alt: Primary Reference DOI\n\n|\n\nZocalo is an automated data processing system designed at Diamond Light Source. This repository contains infrastructure components for Zocalo.\n\nThe idea of Zocalo is a simple one - to build a messaging framework, where text-based messages are sent between parts of the system to coordinate data analysis. In the wider scope of things this also covers things like archiving, but generally it is handling everything that happens after data aquisition.\n\nZocalo as a wider whole is made up of two repositories (plus some private internal repositories when deployed at Diamond):\n\n* `DiamondLightSource/python-zocalo <https://github.com/DiamondLightSource/python-zocalo>`_ - Infrastructure components for automated data processing, developed by Diamond Light Source. The package is available through `PyPi <https://pypi.org/project/zocalo/>`__ and `conda-forge <https://anaconda.org/conda-forge/zocalo>`__.\n* `DiamondLightSource/python-workflows <https://github.com/DiamondLightSource/python-workflows/>`_ - Zocalo is built on the workflows package. It shouldn't be necessary to interact too much with this package, as the details are abstracted by Zocalo. workflows controls the logic of how services connect to each other and what a service is, and actually send the messages to a message broker. Currently this is an ActiveMQ_ broker (via STOMP_) but support for a RabbitMQ_ broker (via pika_) is being added. This is also available on `PyPi <https://pypi.org/project/workflows/>`__ and `conda-forge <https://anaconda.org/conda-forge/workflows>`__.\n\nAs mentioned, Zocalo is currently built on top of ActiveMQ. ActiveMQ is an apache project that provides a `message broker <https://en.wikipedia.org/wiki/Message_broker>`_ server, acting as a central dispatch that allows various services to communicate. Messages are plaintext, but from the Zocalo point of view it's passing aroung python objects (json dictionaries). Every message sent has a destination to help the message broker route. Messages may either be sent to a specific queue or broadcast to multiple queues. These queues are subscribed to by the services that run in Zocalo. In developing with Zocalo, you may have to interact with ActiveMQ or RabbitMQ, but it is unlikely that you will have to configure it.\n\nZocalo allows for the monitoring of jobs executing ``python-workflows`` services or recipe wrappers. The ``python-workflows`` package contains most of the infrastructure required for the jobs themselves and more detailed documentation of its components can be found in the ``python-workflows`` `GitHub repository <https://github.com/DiamondLightSource/python-workflows/>`_ and `the Zocalo documentation <https://zocalo.readthedocs.io>`_.\n\n.. _ActiveMQ: http://activemq.apache.org/\n.. _STOMP: https://stomp.github.io/\n.. _RabbitMQ: https://www.rabbitmq.com/\n.. _pika: https://github.com/pika/pika\n\nCore Concepts\n-------------\n\nThere are two kinds of task run in Zocalo: *services* and *wrappers*.\nA service should handle a discrete short-lived task, for example a data processing job on a small data packet (e.g. finding spots on a single image in an X-ray crystallography context), or inserting results into a database.\nIn contrast, wrappers can be used for longer running tasks, for example running data processing programs such as xia2_ or fast_ep_.\n\n* A **service** starts in the background and waits for work. There are many services constantly running as part of normal Zocalo operation. In typical usage at Diamond there are ~100 services running at a time.\n* A **wrapper** on the other hand, is only run when needed. They wrap something that is not necessarily aware of Zocalo - e.g. downstream processing software such as xia2 have no idea what zocalo is, and shouldn't have to. A wrapper takes a message, converts to the instantiation of command line, runs the software - typically as a cluster job, then reformats the results into a message to send back to Zocalo. These processes have no idea what Zocalo is, but are being run by a script that handles the wrapping.\n\nAt Diamond, everything goes to one service to start with: the **Dispatcher**. This takes the initial request message and attaches useful information for the rest of Zocalo. The implementation of the Dispatcher at Diamond is environment specific and not public, but it does some things that would be useful for a similar service to do in other contexts. At Diamond there is interaction with the `ISPyB database <https://github.com/DiamondLightSource/ispyb-database>`_ that stores information about what is run, metadata, how many images, sample type etc. Data stored in the database influences what software we want to be running and this information might need to be read from the database in many, many services. We obviously don't want to read the same thing from many clients and flood the database, and don't want the database to be a single point of failure. The dispatcher front-loads all the database operations - it takes the data collection ID (DCID) and looks up in ISPyB all the information that could be needed for processing. In terms of movement through the system, it sits between the initial message and the services:\n\n.. code:: bash\n\n        message -> Dispatcher -> [Services]\n\nAt end of processing there might be information that needs to go back into the databases, for which Diamond has a special ISPyB service to do the writing. If the DB goes down, that is fine - things will queue up for the ISPyB service and get processed when the database becomes available again, and written to the database when ready. This isolates us somewhat from intermittent failures.\n\nThe only public Zocalo service at present is ``Schlockmeister``, a garbage collection service that removes jobs that have been requeued mutliple times. Diamond operates a variety of internal Zocalo services which perform frequently required operations in a data analysis pipeline.\n\n.. _xia2: https://xia2.github.io/\n.. _fast_ep: https://github.com/DiamondLightSource/fast_ep\n\nWorking with Zocalo\n-------------------\n\n`Graylog <https://www.graylog.org/>`_ is used to manage the logs produced by Zocalo. Once Graylog and the message broker server are running then services and wrappers can be launched with Zocalo.\n\nZocalo provides the following command line tools::\n  * ``zocalo.go``: trigger the processing of a recipe\n  * ``zocalo.wrap``: run a command while exposing its status to Zocalo so that it can be tracked\n  * ``zocalo.service``: start a new instance of a service\n  * ``zocalo.shutdown``: shutdown either specific instances of Zocalo services or all instances for a given type of service\n  * ``zocalo.queue_drain``: drain one queue into another in a controlled manner\n\nServices are available through ``zocalo.service`` if they are linked through the ``workflows.services`` entry point in ``setup.py``. For example, to start a Schlockmeister service:\n\n.. code:: bash\n\n        $ zocalo.service -s Schlockmeister\n\n.. list-table::\n        :widths: 100\n        :header-rows: 1\n\n        * - Q: How are services started?\n        * - A: Zocalo itself is agnostic on this point. Some of the services are self-propagating and employ simple scaling behaviour - in particular the per-image-analysis services. The services in general all run on cluster nodes, although this means that they can not be long lived - beyond a couple of hours there is a high risk of the service cluster jobs being terminated or pre-empted. This also helps encourage programming more robust services if they could be killed.\n\n.. list-table::\n        :widths: 100\n        :header-rows: 1\n\n        * - Q: So if a service is terminated in the middle of processing it will still get processed?\n        * - A: Yes, messages are handled in transactions - while a service is processing a message, it's marked as \"in-progress\" but isn't completely dropped. If the service doesn't process the message, or it's connection to ActiveMQ gets dropped, then it get's requeued so that another instance of the service can pick it up.\n\nRepeat Message Failure\n----------------------\n\nHow are repeat errors handled? This is a problem with the system - if e.g. an image or malformed message kills a service then it will get requeued, and will eventually kill all instances of the service running (which will get re-spawned, and then die, and so forth).\n\nWe have a special service that looks for repeat failures and moves them to a special \"Dead Letter Queue\". This service is called Schlockmeister_, and is the only service at time of writing that has migrated to the public zocalo repository. This service looks inside the message that got sent, extracts some basic information from the message in as safe a way as possible and repackages to the DLQ with information on what it was working on, and the \"history\" of where the message chain has been routed.\n\n.. _Schlockmeister: https://github.com/DiamondLightSource/python-zocalo/tree/master/zocalo/service\n\n\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Infrastructure components for automated data processing at Diamond Light Source",
    "version": "1.2.0",
    "project_urls": {
        "Bug-Tracker": "https://github.com/DiamondLightSource/python-zocalo/issues",
        "Changelog": "https://github.com/DiamondLightSource/python-zocalo/blob/main/HISTORY.rst",
        "Documentation": "https://github.com/DiamondLightSource/python-zocalo",
        "Download": "https://github.com/DiamondLightSource/python-zocalo/releases",
        "GitHub": "https://github.com/DiamondLightSource/python-zocalo"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0d26de8f504e21e88fbc51e0f52d1942dc9e77745b3c5e7599b8f5c5698f4521",
                "md5": "7d39728e6a8d87ba575bcf66ca61cf82",
                "sha256": "4438b43d4cbbb1735dabecf0a2875566f81df267186688cc407225e1e5d502a6"
            },
            "downloads": -1,
            "filename": "zocalo-1.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7d39728e6a8d87ba575bcf66ca61cf82",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 74689,
            "upload_time": "2024-11-14T13:31:55",
            "upload_time_iso_8601": "2024-11-14T13:31:55.029681Z",
            "url": "https://files.pythonhosted.org/packages/0d/26/de8f504e21e88fbc51e0f52d1942dc9e77745b3c5e7599b8f5c5698f4521/zocalo-1.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a86d464252883a03fb0fc83690b1292fc22c25705db37f54b7afb507e6594683",
                "md5": "9cb3203b0f9a1c4c6351317bc10616fa",
                "sha256": "faea1352ac554219c0fc57b68dbac513a1ba1852e6bf00161d4eb497817dd79b"
            },
            "downloads": -1,
            "filename": "zocalo-1.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "9cb3203b0f9a1c4c6351317bc10616fa",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 65224,
            "upload_time": "2024-11-14T13:31:56",
            "upload_time_iso_8601": "2024-11-14T13:31:56.978713Z",
            "url": "https://files.pythonhosted.org/packages/a8/6d/464252883a03fb0fc83690b1292fc22c25705db37f54b7afb507e6594683/zocalo-1.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-14 13:31:56",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "DiamondLightSource",
    "github_project": "python-zocalo",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "zocalo"
}
        
Elapsed time: 0.37499s