django-birdbath


Namedjango-birdbath JSON
Version 2.0.1 PyPI version JSON
download
home_pageNone
SummaryA simple tool for giving Django database data a good wash. Anonymise user data, delete stuff you
upload_time2024-10-22 14:20:01
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseNone
keywords django anonymization data cleaning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Django Birdbath

A simple tool for giving Django database data a good wash. Anonymise user data, delete stuff you don't need in your development environment, or whatever it is you need to do.

## Usage

1. Add `birdbath` to your `INSTALLED_APPS`
2. Set `BIRDBATH_CHECKS` and `BIRDBATH_PROCESSORS` as appropriate in your settings file (see Configuration below).
3. Run `./manage.py run_birdbath` to trigger processors.

Important! The default processors are destructive and will anonymise User emails and passwords. Do not run in production!

By default, Birdbath enables a [Django system check](https://docs.djangoproject.com/en/3.0/topics/checks/) which will trigger an error if a Birdbath cleanup has not been triggered on the current environment.

This is intended to give developers a hint that they need to anonymise/cleanup their data before running commands such as `runserver`.

The suggested approach is to set `BIRDBATH_REQUIRED` to `False` in production environments using an environment variable.

Checks can be skipped using the `--skip-checks` flag on `run_birdbath` or by setting `BIRDBATH_SKIP_CHECKS = True` in your Django settings.

## Configuration

### Common Settings

- `BIRDBATH_REQUIRED` (default: `True`) - if True, a Django system check will throw an error if anonymisation has not been executed. Set to `False` in your production environments.
- `BIRDBATH_CHECKS` - a list of paths to 'Check' classes to be executed before processors. If any of these returns False, the processors will refuse to run.
- `BIRDBATH_PROCESSORS` - a list of paths to 'Processor' classes to be executed to clean data.

### Processor Specific Settings

- `BIRDBATH_USER_ANONYMISER_EXCLUDE_EMAIL_RE` (default: `example\.com$`) - A regex pattern which will be used to exclude users that match a certain email address when anonymising.
- `BIRDBATH_USER_ANONYMISER_EXCLUDE_SUPERUSERS` (default: `True`) - If True, users with `is_superuser` set to True will be excluded from anonymisation.

## Implementing your Own

Your site will probably have some of your own check/processor needs.

### Checks

Custom checks can be implemented by subclassing `birdbath.checks.BaseCheck` and implementing the `check` method:

```python
from birdbath.checks import BaseCheck


class IsDirtyCheck(BaseCheck):
    def check(self):
        return os.environ.get("IS_DIRTY")
```

The `check` method should either return `True` if the checks should continue, or `False` to stop checking and prevent processors from running.

### Processors

Custom processors can be implemented by subclassing `birdbath.processors.BaseProcessor` and implementing the `run` method:

```python
from birdbath.processors import BaseProcessor


class DeleteAllMyUsersProcessor(BaseProcessor):
    def run(self):
        User.objects.all().delete()
```

There are also more specialised base classes in `birdbath.processors` that can help you write cleaner custom processors. For example, the above example could be written using the `BaseModelDeleter` class instead:

```python
from birdbath.processors import BaseModelDeleter


class DeleteAllMyUsersProcessor(BaseModelDeleter):
    model = User
```

If you only need to delete a subset of users, you can override the `get_queryset()` method, like so:

```python
from birdbath.processors import BaseModelDeleter


class DeleteNonStaffUsersProcessor(BaseModelDeleter):
    model = User

    def get_queryset(self):
        return super().get_queryset().filter(is_staff=False)
```

If you're looking to 'anonymise' rather than delete objects, you will likely find the `BaseModelAnonymiser` class useful. You just need to indicate the fields that should be 'anonymised' or 'cleared', and the class will do the rest. For example:

```python
from birdbath.processors import BaseModelAnonymiser


class UserAnonymiser(BaseModelAnonymiser):
    model = User

    # generate random replacement values for these fields
    anoymise_fields = ["first_name", "last_name", "email", "password"]


class CustomerProfileAnonymiser(BaseModelAnonymiser):
    model = CustomerProfile

    # generate random replacement values for these fields
    anoymise_fields = ["date_of_birth"]

    # set these fields to ``None`` (if supported), or a blank string
    clear_fields = ["email_consent", "sms_consent", "phone_consent", "organisation"]
```

The class will generate:
- Valid but non-existent email addresses for fields using `django.db.models.EmailField`.
- Random choice selections for any field with `choices` defined at the field level.
- Historic dates for fields using `django.db.models.DateField` or `django.db.models.DateTimeField`.
- Random numbers for fields using `django.db.models.IntegerField` (or one of it's subclasses), `django.db.models.FloatField` or `django.db.models.DecimalField`.
- Real-looking first names for fields with one of the following names: `"first_name"`, `"forename"`, `"given_name"`, `"middle_name"`.
- Real-looking last names for fields with one of the following names:
`"last_name"`, `"surname"`, `"family_name"`.
- Random strings for any other fields using `django.db.models.CharField`, `django.db.models.TextField` or a subclass of those.

 If you have fields with custom validation requirements, or would simply like to generate more realistic replacement values, you can add 'generate' methods to your subclass to achieve this. `BaseModelAnonymiser` will automatically look for method matching the format `"generate_{field_name}"` when anoymising field values. For example, the following processor will generate random values for "account_holder" and "account_number" fields:

```python
from birdbath.processors import BaseModelAnonymiser


class DirectDebitDeclarationAnonymiser(BaseModelAnonymiser):

    model = DirectDebitDeclaration
    anonymise_fields = ["account_holder", "account_number"]

    def generate_account_holder(self, field, obj):
        # Return a value to replace 'account_holder' field values
        # `field` is the field instance from the model
        # `obj` is the model instance being updated
        return self.faker.name()

    def generate_account_number(self, field, obj):
        # Return a value to replace 'account_number' field values
        # `field` is the field instance from the model
        # `obj` is the model instance being updated
        return self.faker.iban()
```


## Check/Processor Reference

### Checks

- `checks.contrib.heroku.HerokuNotProductionCheck` - fails if the `HEROKU_APP_NAME` environment variable is not set, or if it set and includes the word `production`.
- `checks.contrib.heroku.HerokuAnonymisationAllowedCheck` - fails if the `ALLOWS_ANONYMISATION` environment variable does not match the name of the application.

### Processors

- `processors.users.UserEmailAnonymiser` - replaces user email addresses with randomised addresses
- `processors.users.UserPasswordAnonymiser` - replaces user passwords with random UUIDs
- `processors.contrib.wagtail.SearchQueryCleaner` - removes the full search query history
- `processors.contrib.wagtail.FormSubmissionCleaner` - removes all form submissions


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "django-birdbath",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "django, anonymization, data cleaning",
    "author": null,
    "author_email": "Torchbox <info@torchbox.com>",
    "download_url": "https://files.pythonhosted.org/packages/b2/53/3c5c1aae3e38e73901d8e4bf6ad1a74a24dac476d07615ec5d4d17092986/django_birdbath-2.0.1.tar.gz",
    "platform": null,
    "description": "# Django Birdbath\n\nA simple tool for giving Django database data a good wash. Anonymise user data, delete stuff you don't need in your development environment, or whatever it is you need to do.\n\n## Usage\n\n1. Add `birdbath` to your `INSTALLED_APPS`\n2. Set `BIRDBATH_CHECKS` and `BIRDBATH_PROCESSORS` as appropriate in your settings file (see Configuration below).\n3. Run `./manage.py run_birdbath` to trigger processors.\n\nImportant! The default processors are destructive and will anonymise User emails and passwords. Do not run in production!\n\nBy default, Birdbath enables a [Django system check](https://docs.djangoproject.com/en/3.0/topics/checks/) which will trigger an error if a Birdbath cleanup has not been triggered on the current environment.\n\nThis is intended to give developers a hint that they need to anonymise/cleanup their data before running commands such as `runserver`.\n\nThe suggested approach is to set `BIRDBATH_REQUIRED` to `False` in production environments using an environment variable.\n\nChecks can be skipped using the `--skip-checks` flag on `run_birdbath` or by setting `BIRDBATH_SKIP_CHECKS = True` in your Django settings.\n\n## Configuration\n\n### Common Settings\n\n- `BIRDBATH_REQUIRED` (default: `True`) - if True, a Django system check will throw an error if anonymisation has not been executed. Set to `False` in your production environments.\n- `BIRDBATH_CHECKS` - a list of paths to 'Check' classes to be executed before processors. If any of these returns False, the processors will refuse to run.\n- `BIRDBATH_PROCESSORS` - a list of paths to 'Processor' classes to be executed to clean data.\n\n### Processor Specific Settings\n\n- `BIRDBATH_USER_ANONYMISER_EXCLUDE_EMAIL_RE` (default: `example\\.com$`) - A regex pattern which will be used to exclude users that match a certain email address when anonymising.\n- `BIRDBATH_USER_ANONYMISER_EXCLUDE_SUPERUSERS` (default: `True`) - If True, users with `is_superuser` set to True will be excluded from anonymisation.\n\n## Implementing your Own\n\nYour site will probably have some of your own check/processor needs.\n\n### Checks\n\nCustom checks can be implemented by subclassing `birdbath.checks.BaseCheck` and implementing the `check` method:\n\n```python\nfrom birdbath.checks import BaseCheck\n\n\nclass IsDirtyCheck(BaseCheck):\n    def check(self):\n        return os.environ.get(\"IS_DIRTY\")\n```\n\nThe `check` method should either return `True` if the checks should continue, or `False` to stop checking and prevent processors from running.\n\n### Processors\n\nCustom processors can be implemented by subclassing `birdbath.processors.BaseProcessor` and implementing the `run` method:\n\n```python\nfrom birdbath.processors import BaseProcessor\n\n\nclass DeleteAllMyUsersProcessor(BaseProcessor):\n    def run(self):\n        User.objects.all().delete()\n```\n\nThere are also more specialised base classes in `birdbath.processors` that can help you write cleaner custom processors. For example, the above example could be written using the `BaseModelDeleter` class instead:\n\n```python\nfrom birdbath.processors import BaseModelDeleter\n\n\nclass DeleteAllMyUsersProcessor(BaseModelDeleter):\n    model = User\n```\n\nIf you only need to delete a subset of users, you can override the `get_queryset()` method, like so:\n\n```python\nfrom birdbath.processors import BaseModelDeleter\n\n\nclass DeleteNonStaffUsersProcessor(BaseModelDeleter):\n    model = User\n\n    def get_queryset(self):\n        return super().get_queryset().filter(is_staff=False)\n```\n\nIf you're looking to 'anonymise' rather than delete objects, you will likely find the `BaseModelAnonymiser` class useful. You just need to indicate the fields that should be 'anonymised' or 'cleared', and the class will do the rest. For example:\n\n```python\nfrom birdbath.processors import BaseModelAnonymiser\n\n\nclass UserAnonymiser(BaseModelAnonymiser):\n    model = User\n\n    # generate random replacement values for these fields\n    anoymise_fields = [\"first_name\", \"last_name\", \"email\", \"password\"]\n\n\nclass CustomerProfileAnonymiser(BaseModelAnonymiser):\n    model = CustomerProfile\n\n    # generate random replacement values for these fields\n    anoymise_fields = [\"date_of_birth\"]\n\n    # set these fields to ``None`` (if supported), or a blank string\n    clear_fields = [\"email_consent\", \"sms_consent\", \"phone_consent\", \"organisation\"]\n```\n\nThe class will generate:\n- Valid but non-existent email addresses for fields using `django.db.models.EmailField`.\n- Random choice selections for any field with `choices` defined at the field level.\n- Historic dates for fields using `django.db.models.DateField` or `django.db.models.DateTimeField`.\n- Random numbers for fields using `django.db.models.IntegerField` (or one of it's subclasses), `django.db.models.FloatField` or `django.db.models.DecimalField`.\n- Real-looking first names for fields with one of the following names: `\"first_name\"`, `\"forename\"`, `\"given_name\"`, `\"middle_name\"`.\n- Real-looking last names for fields with one of the following names:\n`\"last_name\"`, `\"surname\"`, `\"family_name\"`.\n- Random strings for any other fields using `django.db.models.CharField`, `django.db.models.TextField` or a subclass of those.\n\n If you have fields with custom validation requirements, or would simply like to generate more realistic replacement values, you can add 'generate' methods to your subclass to achieve this. `BaseModelAnonymiser` will automatically look for method matching the format `\"generate_{field_name}\"` when anoymising field values. For example, the following processor will generate random values for \"account_holder\" and \"account_number\" fields:\n\n```python\nfrom birdbath.processors import BaseModelAnonymiser\n\n\nclass DirectDebitDeclarationAnonymiser(BaseModelAnonymiser):\n\n    model = DirectDebitDeclaration\n    anonymise_fields = [\"account_holder\", \"account_number\"]\n\n    def generate_account_holder(self, field, obj):\n        # Return a value to replace 'account_holder' field values\n        # `field` is the field instance from the model\n        # `obj` is the model instance being updated\n        return self.faker.name()\n\n    def generate_account_number(self, field, obj):\n        # Return a value to replace 'account_number' field values\n        # `field` is the field instance from the model\n        # `obj` is the model instance being updated\n        return self.faker.iban()\n```\n\n\n## Check/Processor Reference\n\n### Checks\n\n- `checks.contrib.heroku.HerokuNotProductionCheck` - fails if the `HEROKU_APP_NAME` environment variable is not set, or if it set and includes the word `production`.\n- `checks.contrib.heroku.HerokuAnonymisationAllowedCheck` - fails if the `ALLOWS_ANONYMISATION` environment variable does not match the name of the application.\n\n### Processors\n\n- `processors.users.UserEmailAnonymiser` - replaces user email addresses with randomised addresses\n- `processors.users.UserPasswordAnonymiser` - replaces user passwords with random UUIDs\n- `processors.contrib.wagtail.SearchQueryCleaner` - removes the full search query history\n- `processors.contrib.wagtail.FormSubmissionCleaner` - removes all form submissions\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A simple tool for giving Django database data a good wash. Anonymise user data, delete stuff you",
    "version": "2.0.1",
    "project_urls": {
        "Home": "https://github.com/torchbox/django-birdbath"
    },
    "split_keywords": [
        "django",
        " anonymization",
        " data cleaning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "64a6b92c82ecd7bf19bcbc08b761b7dd3b31c455e6bbf22537d3bf46242b2e55",
                "md5": "a5aee94e6adfe1736c23493782a65a4a",
                "sha256": "55332586fe60fcef9679f5e4f2f63a2c56e8ad5894ca37e50623bf50420b921e"
            },
            "downloads": -1,
            "filename": "django_birdbath-2.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a5aee94e6adfe1736c23493782a65a4a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 13171,
            "upload_time": "2024-10-22T14:20:00",
            "upload_time_iso_8601": "2024-10-22T14:20:00.299413Z",
            "url": "https://files.pythonhosted.org/packages/64/a6/b92c82ecd7bf19bcbc08b761b7dd3b31c455e6bbf22537d3bf46242b2e55/django_birdbath-2.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b2533c5c1aae3e38e73901d8e4bf6ad1a74a24dac476d07615ec5d4d17092986",
                "md5": "70e893a142fc290e645f303b8c4573b8",
                "sha256": "e2ada9d9afe1f92603ba4b193f42a3b2d0bf94c1f121767c922f0409b3d8abfd"
            },
            "downloads": -1,
            "filename": "django_birdbath-2.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "70e893a142fc290e645f303b8c4573b8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 14713,
            "upload_time": "2024-10-22T14:20:01",
            "upload_time_iso_8601": "2024-10-22T14:20:01.437482Z",
            "url": "https://files.pythonhosted.org/packages/b2/53/3c5c1aae3e38e73901d8e4bf6ad1a74a24dac476d07615ec5d4d17092986/django_birdbath-2.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-22 14:20:01",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "torchbox",
    "github_project": "django-birdbath",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "django-birdbath"
}
        
Elapsed time: 4.28905s