django-datawatch


Namedjango-datawatch JSON
Version 5.0.0 PyPI version JSON
download
home_pagehttps://github.com/RegioHelden/django-datawatch
SummaryDjango Datawatch runs automated data checks in your Django installation
upload_time2024-03-06 17:38:10
maintainer
docs_urlNone
authorJens Nistler <opensource@jensnistler.de>, Bogdan Radko <bogdan.radko@regiohelden.de>
requires_python
licenseMIT
keywords django monitoring datawatch check checks
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![PyPI version](https://badge.fury.io/py/django-datawatch.svg)](https://badge.fury.io/py/django-datawatch)
[![GitHub build status](https://github.com/RegioHelden/django-datawatch/workflows/Test/badge.svg)](https://github.com/RegioHelden/django-datawatch/actions)
[![Open Source Love](https://badges.frapsoft.com/os/v2/open-source.svg?v=103)](https://github.com/ellerbrock/open-source-badges/)
[![MIT Licence](https://badges.frapsoft.com/os/mit/mit.svg?v=103)](https://opensource.org/licenses/mit-license.php)

# Django Datawatch

With Django Datawatch you are able to implement arbitrary checks on data, review their status and even describe what to do to resolve them.
Think of [nagios](https://www.nagios.org/)/[icinga](https://www.icinga.org/) for data.

## Check execution backends

### Synchronous

Will execute all tasks synchronously which is not recommended but the most simple way to get started.

### Celery

Will execute the tasks asynchronously using celery as a task broker and executor.
Requires celery 5.0.0 or later.

### Other backends

Feel free to implement other task execution backends and send a pull request.

## Install

```shell
$ pip install django-datawatch
```

Add `django_datawatch` to your `INSTALLED_APPS`

## Celery beat database scheduler

If the datawatch scheduler should be run using the celery beat database scheduler, you need to install [django_celery_beat](hhttp://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html#beat-custom-schedulers).

Add `django_datawatch.tasks.django_datawatch_scheduler` to the `CELERYBEAT_SCHEDULE` of your app.
This task should be executed every minute e.g. `crontab(minute='*/1')`, see example app.

## Write a custom check

Create `checks.py` inside your module.

```python
from datetime import datetime

from celery.schedules import crontab

from django_datawatch.datawatch import datawatch
from django_datawatch.base import BaseCheck, CheckResponse
from django_datawatch.models import Result


@datawatch.register
class CheckTime(BaseCheck):
    run_every = crontab(minute='*/5')  # scheduler will execute this check every 5 minutes

    def generate(self):
        yield datetime.now()

    def check(self, payload):
        response = CheckResponse()
        if payload.hour <= 7:
            response.set_status(Result.STATUS.ok)
        elif payload.hour <= 12:
            response.set_status(Result.STATUS.warning)
        else:
            response.set_status(Result.STATUS.critical)
        return response

    def get_identifier(self, payload):
        # payload will be our datetime object that we are getting from generate method
        return payload

    def get_payload(self, identifier):
        # as get_identifier returns the object we don't need to process it
        # we can return identifier directly
        return identifier

    def user_forced_refresh_hook(self, payload):
        payload.do_something()
```

### .generate

Must yield payloads to be checked. The check method will then be called for every payload.

### .check

Must return an instance of CheckResponse.

### .get_identifier

Must return a unique identifier for the payload. 

### .user_forced_refresh_hook

A function that gets executed when the refresh is requested by a user through the `ResultRefreshView`.

This is used in checks that are purely based on triggers, e.g. when a field changes the test gets executed.

### trigger check updates

Check updates for individual payloads can also be triggered when related datasets are changed.
The map for update triggers is defined in the Check class' trigger_update attribute.

```
trigger_update = dict(subproduct=models_customer.SubProduct)
```

The key is a slug to define your trigger while the value is the model that issues the trigger when saved.
You must implement a resolver function for each entry with the name of get_<slug>_payload which returns the payload or multiple payloads (as a list) to check (same datatype as .check would expect or .generate would yield).
```
def get_subproduct_payload(self, instance):
    return instance.product
```

## Exceptions

#### `DatawatchCheckSkipException`
raise this exception to skip current check. The result will not appear in the checks results. 

## Run your checks

A management command is provided to queue the execution of all checks based on their schedule.
Add a crontab to run this command every minute and it will check if there's something to do.

```shell
$ ./manage.py datawatch_run_checks
$ ./manage.py datawatch_run_checks --slug=example.checks.UserHasEnoughBalance
```

## Refresh your check results

A management command is provided to forcefully refresh all existing results for a check.
This comes in handy if you changes the logic of your check and don't want to wait until the periodic execution or an update trigger.

```shell
$ ./manage.py datawatch_refresh_results
$ ./manage.py datawatch_refresh_results --slug=example.checks.UserHasEnoughBalance
```

## Get a list of registered checks

```shell
$ ./manage.py datawatch_list_checks
```

## Clean up your database

Remove the unnecessary check results and executions if you've removed the code for a check.

```shell
$ ./manage.py datawatch_clean_up
```

## Settings

```python
DJANGO_DATAWATCH_BACKEND = 'django_datawatch.backends.synchronous'
DJANGO_DATAWATCH_RUN_SIGNALS = True
```

### DJANGO_DATAWATCH_BACKEND

You can chose the backend to run the tasks. Supported are 'django_datawatch.backends.synchronous' and 'django_datawatch.backends.celery'.

Default: 'django_datawatch.backends.synchronous'

### DJANGO_DATAWATCH_RUN_SIGNALS

Use this setting to disable running post_save updates during unittests if required.

Default: True

### celery task queue

Datawatch supported setting a specific queue in release < 0.4.0

With the switch to celery 4, you should use task routing to define the queue for your tasks, see http://docs.celeryproject.org/en/latest/userguide/routing.html

## Migrating from 3.x to 4.x

In `4.x`, the base check class supports two new methods (`get_assigned_users` and `get_assigned_groups`). This is a breaking change as the old `get_assigned_user` and `get_assigned_group` methods have been removed.

This change allows checks to assign multiple users and groups to a check. If you have implemented custom checks, you need to update your code to return a list of users and groups instead of a single user and group. e.g.

From

```python
from django_datawatch.base import BaseCheck

class CustomCheck(BaseCheck):
    ...
    def get_assigned_user(self, payload, result) -> Optional[AbstractUser]:
        return None # or a user
    def get_assigned_group(self, payload, result) -> Optional[Group]:
        return None # or a group
```

To

```python
from django_datawatch.base import BaseCheck

class CustomCheck(BaseCheck):
    ...
    def get_assigned_users(self, payload, result) -> Optional[List[AbstractUser]]:
        return None  # or a list of users
    def get_assigned_groups(self, payload, result) -> Optional[List[Group]]:
        return None  # or a list of groups
```

# CONTRIBUTE

## Dev environment
- Latest [Docker](https://docs.docker.com/) with the integrated compose plugin

Please make sure that no other container is using port 8000 as this is the one you're install gets exposed to:
http://localhost:8000/

## Setup

We've included an example app to show how django_datawatch works.
Start by launching the included docker container.
```bash
docker compose up -d
```

Then setup the example app environment.
```bash
docker compose run --rm app migrate
docker compose run --rm app loaddata example
```
The installed superuser is "example" with password "datawatch".

## Run checks

Open http://localhost:8000/, log in and then go back to http://localhost:8000/.
You'll be prompted with an empty dashboard. That's because we didn't run any checks yet.
Let's enqueue an update.
```bash
docker compose run --rm app datawatch_run_checks --force
```

The checks for the example app are run synchronously and should be updated immediately.
If you decide to switch to the celery backend, you should now start a celery worker to process the checks.
```bash
docker compose run --rm --entrypoint celery app -A example worker -l DEBUG
```

To execute the celery beat scheduler which runs the datawatch scheduler every minute, just run:
```bash
docker compose run --rm --entrypoint celery app -A example  beat --scheduler django_celery_beat.schedulers:DatabaseScheduler
```

You will see some failed check now after you refreshed the dashboard view.

![Django Datawatch dashboard](http://static.jensnistler.de/django_datawatch.png "Django Datawatch dashboard")

![Django Datawatch detail view](http://static.jensnistler.de/django_datawatch_details.png "Django Datawatch detail view")

## Run the tests
```bash
docker compose run --rm app test
```

## Requirements upgrades

Check for upgradeable packages by running 
```bash
docker compose up -d
docker compose exec app pip-check
```

## Translations

Collect and compile translations for all registered locales

```bash
docker compose run --rm app makemessages --no-location --all
docker compose run --rm app compilemessages
```

## Making a new release

[bumpversion](https://github.com/peritus/bumpversion) is used to manage releases.

Add your changes to the [CHANGELOG](./CHANGELOG.rst), run
```bash
docker compose exec app bumpversion <major|minor|patch>
```
then push (including tags).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/RegioHelden/django-datawatch",
    "name": "django-datawatch",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "django,monitoring,datawatch,check,checks",
    "author": "Jens Nistler <opensource@jensnistler.de>, Bogdan Radko <bogdan.radko@regiohelden.de>",
    "author_email": "opensource@regiohelden.de",
    "download_url": "https://files.pythonhosted.org/packages/b0/f0/a411a228b880550ad3a254f6779db888eefe66534ffa3c6d901c92b93706/django-datawatch-5.0.0.tar.gz",
    "platform": null,
    "description": "[![PyPI version](https://badge.fury.io/py/django-datawatch.svg)](https://badge.fury.io/py/django-datawatch)\n[![GitHub build status](https://github.com/RegioHelden/django-datawatch/workflows/Test/badge.svg)](https://github.com/RegioHelden/django-datawatch/actions)\n[![Open Source Love](https://badges.frapsoft.com/os/v2/open-source.svg?v=103)](https://github.com/ellerbrock/open-source-badges/)\n[![MIT Licence](https://badges.frapsoft.com/os/mit/mit.svg?v=103)](https://opensource.org/licenses/mit-license.php)\n\n# Django Datawatch\n\nWith Django Datawatch you are able to implement arbitrary checks on data, review their status and even describe what to do to resolve them.\nThink of [nagios](https://www.nagios.org/)/[icinga](https://www.icinga.org/) for data.\n\n## Check execution backends\n\n### Synchronous\n\nWill execute all tasks synchronously which is not recommended but the most simple way to get started.\n\n### Celery\n\nWill execute the tasks asynchronously using celery as a task broker and executor.\nRequires celery 5.0.0 or later.\n\n### Other backends\n\nFeel free to implement other task execution backends and send a pull request.\n\n## Install\n\n```shell\n$ pip install django-datawatch\n```\n\nAdd `django_datawatch` to your `INSTALLED_APPS`\n\n## Celery beat database scheduler\n\nIf the datawatch scheduler should be run using the celery beat database scheduler, you need to install [django_celery_beat](hhttp://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html#beat-custom-schedulers).\n\nAdd `django_datawatch.tasks.django_datawatch_scheduler` to the `CELERYBEAT_SCHEDULE` of your app.\nThis task should be executed every minute e.g. `crontab(minute='*/1')`, see example app.\n\n## Write a custom check\n\nCreate `checks.py` inside your module.\n\n```python\nfrom datetime import datetime\n\nfrom celery.schedules import crontab\n\nfrom django_datawatch.datawatch import datawatch\nfrom django_datawatch.base import BaseCheck, CheckResponse\nfrom django_datawatch.models import Result\n\n\n@datawatch.register\nclass CheckTime(BaseCheck):\n    run_every = crontab(minute='*/5')  # scheduler will execute this check every 5 minutes\n\n    def generate(self):\n        yield datetime.now()\n\n    def check(self, payload):\n        response = CheckResponse()\n        if payload.hour <= 7:\n            response.set_status(Result.STATUS.ok)\n        elif payload.hour <= 12:\n            response.set_status(Result.STATUS.warning)\n        else:\n            response.set_status(Result.STATUS.critical)\n        return response\n\n    def get_identifier(self, payload):\n        # payload will be our datetime object that we are getting from generate method\n        return payload\n\n    def get_payload(self, identifier):\n        # as get_identifier returns the object we don't need to process it\n        # we can return identifier directly\n        return identifier\n\n    def user_forced_refresh_hook(self, payload):\n        payload.do_something()\n```\n\n### .generate\n\nMust yield payloads to be checked. The check method will then be called for every payload.\n\n### .check\n\nMust return an instance of CheckResponse.\n\n### .get_identifier\n\nMust return a unique identifier for the payload. \n\n### .user_forced_refresh_hook\n\nA function that gets executed when the refresh is requested by a user through the `ResultRefreshView`.\n\nThis is used in checks that are purely based on triggers, e.g. when a field changes the test gets executed.\n\n### trigger check updates\n\nCheck updates for individual payloads can also be triggered when related datasets are changed.\nThe map for update triggers is defined in the Check class' trigger_update attribute.\n\n```\ntrigger_update = dict(subproduct=models_customer.SubProduct)\n```\n\nThe key is a slug to define your trigger while the value is the model that issues the trigger when saved.\nYou must implement a resolver function for each entry with the name of get_<slug>_payload which returns the payload or multiple payloads (as a list) to check (same datatype as .check would expect or .generate would yield).\n```\ndef get_subproduct_payload(self, instance):\n    return instance.product\n```\n\n## Exceptions\n\n#### `DatawatchCheckSkipException`\nraise this exception to skip current check. The result will not appear in the checks results. \n\n## Run your checks\n\nA management command is provided to queue the execution of all checks based on their schedule.\nAdd a crontab to run this command every minute and it will check if there's something to do.\n\n```shell\n$ ./manage.py datawatch_run_checks\n$ ./manage.py datawatch_run_checks --slug=example.checks.UserHasEnoughBalance\n```\n\n## Refresh your check results\n\nA management command is provided to forcefully refresh all existing results for a check.\nThis comes in handy if you changes the logic of your check and don't want to wait until the periodic execution or an update trigger.\n\n```shell\n$ ./manage.py datawatch_refresh_results\n$ ./manage.py datawatch_refresh_results --slug=example.checks.UserHasEnoughBalance\n```\n\n## Get a list of registered checks\n\n```shell\n$ ./manage.py datawatch_list_checks\n```\n\n## Clean up your database\n\nRemove the unnecessary check results and executions if you've removed the code for a check.\n\n```shell\n$ ./manage.py datawatch_clean_up\n```\n\n## Settings\n\n```python\nDJANGO_DATAWATCH_BACKEND = 'django_datawatch.backends.synchronous'\nDJANGO_DATAWATCH_RUN_SIGNALS = True\n```\n\n### DJANGO_DATAWATCH_BACKEND\n\nYou can chose the backend to run the tasks. Supported are 'django_datawatch.backends.synchronous' and 'django_datawatch.backends.celery'.\n\nDefault: 'django_datawatch.backends.synchronous'\n\n### DJANGO_DATAWATCH_RUN_SIGNALS\n\nUse this setting to disable running post_save updates during unittests if required.\n\nDefault: True\n\n### celery task queue\n\nDatawatch supported setting a specific queue in release < 0.4.0\n\nWith the switch to celery 4, you should use task routing to define the queue for your tasks, see http://docs.celeryproject.org/en/latest/userguide/routing.html\n\n## Migrating from 3.x to 4.x\n\nIn `4.x`, the base check class supports two new methods (`get_assigned_users` and `get_assigned_groups`). This is a breaking change as the old `get_assigned_user` and `get_assigned_group` methods have been removed.\n\nThis change allows checks to assign multiple users and groups to a check. If you have implemented custom checks, you need to update your code to return a list of users and groups instead of a single user and group. e.g.\n\nFrom\n\n```python\nfrom django_datawatch.base import BaseCheck\n\nclass CustomCheck(BaseCheck):\n    ...\n    def get_assigned_user(self, payload, result) -> Optional[AbstractUser]:\n        return None # or a user\n    def get_assigned_group(self, payload, result) -> Optional[Group]:\n        return None # or a group\n```\n\nTo\n\n```python\nfrom django_datawatch.base import BaseCheck\n\nclass CustomCheck(BaseCheck):\n    ...\n    def get_assigned_users(self, payload, result) -> Optional[List[AbstractUser]]:\n        return None  # or a list of users\n    def get_assigned_groups(self, payload, result) -> Optional[List[Group]]:\n        return None  # or a list of groups\n```\n\n# CONTRIBUTE\n\n## Dev environment\n- Latest [Docker](https://docs.docker.com/) with the integrated compose plugin\n\nPlease make sure that no other container is using port 8000 as this is the one you're install gets exposed to:\nhttp://localhost:8000/\n\n## Setup\n\nWe've included an example app to show how django_datawatch works.\nStart by launching the included docker container.\n```bash\ndocker compose up -d\n```\n\nThen setup the example app environment.\n```bash\ndocker compose run --rm app migrate\ndocker compose run --rm app loaddata example\n```\nThe installed superuser is \"example\" with password \"datawatch\".\n\n## Run checks\n\nOpen http://localhost:8000/, log in and then go back to http://localhost:8000/.\nYou'll be prompted with an empty dashboard. That's because we didn't run any checks yet.\nLet's enqueue an update.\n```bash\ndocker compose run --rm app datawatch_run_checks --force\n```\n\nThe checks for the example app are run synchronously and should be updated immediately.\nIf you decide to switch to the celery backend, you should now start a celery worker to process the checks.\n```bash\ndocker compose run --rm --entrypoint celery app -A example worker -l DEBUG\n```\n\nTo execute the celery beat scheduler which runs the datawatch scheduler every minute, just run:\n```bash\ndocker compose run --rm --entrypoint celery app -A example  beat --scheduler django_celery_beat.schedulers:DatabaseScheduler\n```\n\nYou will see some failed check now after you refreshed the dashboard view.\n\n![Django Datawatch dashboard](http://static.jensnistler.de/django_datawatch.png \"Django Datawatch dashboard\")\n\n![Django Datawatch detail view](http://static.jensnistler.de/django_datawatch_details.png \"Django Datawatch detail view\")\n\n## Run the tests\n```bash\ndocker compose run --rm app test\n```\n\n## Requirements upgrades\n\nCheck for upgradeable packages by running \n```bash\ndocker compose up -d\ndocker compose exec app pip-check\n```\n\n## Translations\n\nCollect and compile translations for all registered locales\n\n```bash\ndocker compose run --rm app makemessages --no-location --all\ndocker compose run --rm app compilemessages\n```\n\n## Making a new release\n\n[bumpversion](https://github.com/peritus/bumpversion) is used to manage releases.\n\nAdd your changes to the [CHANGELOG](./CHANGELOG.rst), run\n```bash\ndocker compose exec app bumpversion <major|minor|patch>\n```\nthen push (including tags).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Django Datawatch runs automated data checks in your Django installation",
    "version": "5.0.0",
    "project_urls": {
        "Homepage": "https://github.com/RegioHelden/django-datawatch"
    },
    "split_keywords": [
        "django",
        "monitoring",
        "datawatch",
        "check",
        "checks"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bc67c271b02fa57de5b81cfa7e5d3454097ea4c4744c6c25f602fcad40344bef",
                "md5": "934b492db093fdcccdaa9cb78ac6c3f7",
                "sha256": "e48249beb81b368bd1d49b75b3602e383bb3a6054080a5e9989b0cb77526e007"
            },
            "downloads": -1,
            "filename": "django_datawatch-5.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "934b492db093fdcccdaa9cb78ac6c3f7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 42725,
            "upload_time": "2024-03-06T17:38:08",
            "upload_time_iso_8601": "2024-03-06T17:38:08.680599Z",
            "url": "https://files.pythonhosted.org/packages/bc/67/c271b02fa57de5b81cfa7e5d3454097ea4c4744c6c25f602fcad40344bef/django_datawatch-5.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b0f0a411a228b880550ad3a254f6779db888eefe66534ffa3c6d901c92b93706",
                "md5": "a3eacb213e643cbe44acc1f04f1ed603",
                "sha256": "42ef36ae75f4bef0a68da49db1dd8b6134f9c758068b04c8165d900cc993a904"
            },
            "downloads": -1,
            "filename": "django-datawatch-5.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a3eacb213e643cbe44acc1f04f1ed603",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 32865,
            "upload_time": "2024-03-06T17:38:10",
            "upload_time_iso_8601": "2024-03-06T17:38:10.861470Z",
            "url": "https://files.pythonhosted.org/packages/b0/f0/a411a228b880550ad3a254f6779db888eefe66534ffa3c6d901c92b93706/django-datawatch-5.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-06 17:38:10",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "RegioHelden",
    "github_project": "django-datawatch",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "django-datawatch"
}
        
Elapsed time: 1.80983s