django-scrubber


Namedjango-scrubber JSON
Version 3.0.0 PyPI version JSON
download
home_pagehttps://github.com/regiohelden/django-scrubber
SummaryData Anonymizer for Django
upload_time2024-09-10 09:11:24
maintainerNone
docs_urlNone
authorRegioHelden GmbH
requires_pythonNone
licenseBSD
keywords
VCS
bugtrack_url
requirements coverage coveralls factory_boy faker flake8 mock
Travis-CI No Travis.
coveralls test coverage
            # Django Scrubber

[![Build Status](https://github.com/RegioHelden/django-scrubber/workflows/Build/badge.svg)](https://github.com/RegioHelden/django-scrubber/actions)
[![PyPI](https://img.shields.io/pypi/v/django-scrubber.svg)](https://pypi.org/project/django-scrubber/)
[![Downloads](https://pepy.tech/badge/django-scrubber)](https://pepy.tech/project/django-scrubber)

`django_scrubber` is a django app meant to help you anonymize your project's database data. It destructively alters data
directly on the DB and therefore **should not be used on production**.

The main use case is providing developers with realistic data to use during development, without having to distribute
your customers' or users' potentially sensitive information.
To accomplish this, `django_scrubber` should be plugged in a step during the creation of your database dumps.

Simply mark the fields you want to anonymize and call the `scrub_data` management command. Data will be replaced based
on different *scrubbers* (see below), which define how the anonymous content will be generated.

If you want to be sure that you don't forget any fields in the ongoing development progress, you can use the 
management command `scrub_validation` in your CI/CD pipeline to check for any missing fields.

## Installation

Simply run:

```
pip install django-scrubber
```

And add `django_scrubber` to your django `INSTALLED_APPS`. I.e.: in `settings.py` add:

```
INSTALLED_APPS = [
  ...
  'django_scrubber',
  ...
]
```

## Scrubbing data

In order to scrub data, i.e.: to replace DB data with anonymized versions, `django-scrubber` must know which models and
fields it should act on, and how the data should be replaced.

There are a few different ways to select which data should be scrubbed, namely: explicitly per model field; or globally
per name or field type.

Adding scrubbers directly to model, matching scrubbers to fields by name:

```python
class MyModel(Model):
    somefield = CharField()

    class Scrubbers:
        somefield = scrubbers.Hash('somefield')
```

Adding scrubbers globally, either by field name or field type:

```python
# (in settings.py)

SCRUBBER_GLOBAL_SCRUBBERS = {
    'name': scrubbers.Hash,
    EmailField: scrubbers.Hash,
}
```

Model scrubbers override field-name scrubbers, which in turn override field-type scrubbers.

To disable global scrubbing in some specific model, simply set the respective field scrubber to `None`.

Scrubbers defined for non-existing fields will raise a warning but not fail the scubbing process.

Which mechanism will be used to scrub the selected data is determined by using one of the provided scrubbers
in `django_scrubber.scrubbers`. See below for a list.
Alternatively, values may be anything that can be used as a value in a `QuerySet.update()` call (like `Func` instances,
string literals, etc), or any `callable` that returns such an object when called with a `Field` object as argument.

By default, `django_scrubber` will affect all models from all registered apps. This may lead to issues with third-party
apps if the global scrubbers are too general. This can be avoided with the `SCRUBBER_APPS_LIST` setting. Using this, you
might for instance split your `INSTALLED_APPS` into multiple `SYSTEM_APPS` and `LOCAL_APPS`, then
set `SCRUBBER_APPS_LIST = LOCAL_APPS`, to scrub only your own apps.

Finally just run `./manage.py scrub_data` to **destructively** scrub the registered fields.

### Arguments to the scrub_data command

`--model` Scrub only a single model (format <app_label>.<model_name>)

`--keep-sessions` Will NOT truncate all (by definition critical) session data.

`--remove-fake-data` Will truncate the database table storing preprocessed data for the Faker library.

## Built-In scrubbers

### Empty/Null

The simplest scrubbers: replace the field's content with the empty string or `NULL`, respectively.

```python
class Scrubbers:
    somefield = scrubbers.Empty
    someother = scrubbers.Null
```

These scrubbers have no options.

### Keeper

When running the validation or want to work in strict mode, you maybe want to actively decide to keep certain data
instead of scrubbing them. In this case, you can just define `scrubbers.Keep`.

```python
class Scrubbers:
    non_critical_field = scrubbers.Keep
```

These scrubber doesn't have any options.

### Hash

Simple hashing of content:

```python
class Scrubbers:
    somefield = scrubbers.Hash  # will use the field itself as source
    someotherfield = scrubbers.Hash('somefield')  # can optionally pass a different field name as hashing source
```

Currently, this uses the MD5 hash which is supported in a wide variety of DB engines. Additionally, since security is
not the main objective, a shorter hash length has a lower risk of being longer than whatever field it is supposed to
replace.

### Lorem

Simple scrubber meant to replace `TextField` with a static block of text. Has no options.

```python
class Scrubbers:
    somefield = scrubbers.Lorem
```

### Concat

Wrapper around `django.db.functions.Concat` to enable simple concatenation of scrubbers. This is useful if you want to
ensure a fields uniqueness through composition of, for instance, the `Hash` and `Faker` (see below) scrubbers.

The following will generate random email addresses by hashing the user-part and using `faker` for the domain part:

```python
class Scrubbers:
    email = scrubbers.Concat(scrubbers.Hash('email'), models.Value('@'), scrubbers.Faker('domain_name'))
```

### Faker

Replaces content with the help of [faker](https://pypi.python.org/pypi/Faker).

```python
class Scrubbers:
    first_name = scrubbers.Faker('first_name')
    last_name = scrubbers.Faker('last_name')
    past_date = scrubbers.Faker('past_date', start_date="-30d", tzinfo=None)
```

The replacements are done on the database-level and should therefore be able to cope with large amounts of data with
reasonable performance.

The `Faker` scrubber requires at least one argument: the faker provider used to generate random data.
All [faker providers](https://faker.readthedocs.io/en/latest/providers.html) are supported, and you can also register
your own custom providers.<br />
Any remaining arguments will be passed through to that provider. Please refer to the faker docs if a provider accepts
arguments and what to do with them.

#### Locales

Faker will be initialized with the current django `LANGUAGE_CODE` and will populate the DB with localized data. If you
want localized scrubbing, simply set it to some other value.

#### Idempotency

By default, the faker instance used to populate the DB uses a fixed random seed, in order to ensure different scrubbings
of the same data generate the same output. This is particularly useful if the scrubbed data is imported as a dump by
developers, since changing data during troubleshooting would otherwise be confusing.

This behaviour can be changed by setting `SCRUBBER_RANDOM_SEED=None`, which ensures every scrubbing will generate random
source data.

#### Limitations

Scrubbing unique fields may lead to `IntegrityError`s, since there is no guarantee that the random content will not be
repeated. Playing with different settings for `SCRUBBER_RANDOM_SEED` and `SCRUBBER_ENTRIES_PER_PROVIDER` may alleviate
the problem.
Unfortunately, for performance reasons, the source data for scrubbing with faker is added to the database, and
arbitrarily increasing `SCRUBBER_ENTRIES_PER_PROVIDER` will significantly slow down scrubbing (besides still not
guaranteeing uniqueness).

When using `django < 2.1` and working on `sqlite` a bug within django causes field-specific scrubbing (
e.g. `date_object`) to fail. Please consider using a different database backend or upgrade to the latest django version.

## Scrubbing third-party models

Sometimes you just don't have control over some code, but you still want to scrub the data of a given model.

A good example is the Django user model. It contains sensitive data, and you would have to overwrite the whole model
just to add the scrubber metaclass.

That's the way to go:

1. Define your Scrubber class **somewhere** in your codebase (like a `scrubbers.py`)

```python
# scrubbers.py
class UserScrubbers:
    scrubbers.Faker('de_DE')
    first_name = scrubbers.Faker('first_name')
    last_name = scrubbers.Faker('last_name')
    username = scrubbers.Faker('uuid4')
    password = scrubbers.Faker('sha1')
    last_login = scrubbers.Null
    email = scrubbers.Concat(first_name, models.Value('.'), last_name, models.Value('@'),
                             models.Value(settings.SCRUBBER_DOMAIN))
````

2. Set up a mapping between your third-party model and your scrubber class

```python
# settings.py
SCRUBBER_MAPPING = {
    "auth.User": "apps.account.scrubbers.UserScrubbers",
}
```

## Settings

### `SCRUBBER_GLOBAL_SCRUBBERS`:

Dictionary of global scrubbers. Keys should be either field names as strings or field type classes. Values should be one
of the scrubbers provided in `django_scrubber.scrubbers`.

Example:

```python
SCRUBBER_GLOBAL_SCRUBBERS = {
    'name': scrubbers.Hash,
    EmailField: scrubbers.Hash,
}
```

### `SCRUBBER_RANDOM_SEED`:

The seed used when generating random content by the Faker scrubber. Setting this to `None` means each scrubbing will
generate different data.

(default: `42`)

### `SCRUBBER_ENTRIES_PER_PROVIDER`:

Number of entries to use as source for Faker scrubber. Increasing this value will increase the randomness of generated
data, but decrease performance.

(default: `1000`)

### `SCRUBBER_SKIP_UNMANAGED`:

Do not attempt to scrub models which are not managed by the ORM.

(default: `True`)

### `SCRUBBER_APPS_LIST`:

Only scrub models belonging to these specific django apps. If unset, will scrub all installed apps.

(default: `None`)

### `SCRUBBER_ADDITIONAL_FAKER_PROVIDERS`:

Add additional fake providers to be used by Faker. Must be noted as full dotted path to the provider class.

(default: `{*()}`, empty set)

### `SCRUBBER_FAKER_LOCALE`:

Set an alternative locale for Faker used during the scrubbing process.

(default: `None`, falls back to Django's default locale)

### `SCRUBBER_MAPPING`:

Define a class and a mapper which does not have to live inside the given model. Useful, if you have no control over the
models code you'd like to scrub.

````python
SCRUBBER_MAPPING = {
    "auth.User": "my_app.scrubbers.UserScrubbers",
}
````

(default: `{}`)

### `SCRUBBER_STRICT_MODE`:

When strict mode is activated, you have to define a scrubbing policy for every field of every type defined in
`SCRUBBER_REQUIRED_FIELD_TYPES`. If you have unscrubbed fields and this flag is active, you can't run
`python manage.py scrub_data`.

(default: `False`)

### `SCRUBBER_REQUIRED_FIELD_TYPES`:

Defaults to all text-based Django model fields. Usually, privacy-relevant data is only stored in text-fields, numbers
and booleans (usually) can't contain sensitive personal data. These fields will be checked when running
`python manage.py scrub_validation`.

(default: `(models.CharField, models.TextField, models.URLField, models.JSONField, models.GenericIPAddressField,
           models.EmailField,)`)

### `SCRUBBER_REQUIRED_FIELD_MODEL_WHITELIST`:

Whitelists a list of models which will not be checked during `scrub_validation` and when 
activating the strict mode. Defaults to the non-privacy-related Django base models.
Items can either be full model names (e.g. `auth.Group`) or regular expression patterns matching
against the full model name (e.g. `re.compile(auth.*)` to whitelist all auth models).

(default: `('auth.Group', 'auth.Permission', 'contenttypes.ContentType', 'sessions.Session', 'sites.Site', 
'django_scrubber.FakeData', 'db.TestModel',)`)

(default: {})

## Logging

Scrubber uses the default django logger. The logger name is ``django_scrubber.scrubbers``.
So if you want to log - for example - to the console, you could set up the logger like this:

````
LOGGING['loggers']['django_scrubber'] = {
    'handlers': ['console'],
    'propagate': True,
    'level': 'DEBUG',
}
````

## Making a new release

[bumpversion](https://github.com/peritus/bumpversion) is used to manage releases.

Add your changes to the [CHANGELOG](./CHANGELOG.md) and run `bumpversion <major|minor|patch>`, then push (including
tags)


# Changelog
All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

<!--
## [Unreleased]
-->

## [3.0.0] - 2024-09-10
### Breaking
- Removed `SCRUBBER_VALIDATION_WHITELIST` in favour of `SCRUBBER_REQUIRED_FIELD_MODEL_WHITELIST` - Thanks @GitRon
### Changed
- Added Django test model `db.TestModel` to default whitelist of `SCRUBBER_REQUIRED_FIELD_MODEL_WHITELIST` - Thanks @GitRon
- Removed support for the `mock` package in unit tests
- Adjusted some default settings

## [2.1.1] - 2024-08-20
### Changed
- Fixed an issue where the management command `scrub_validation` could fail even though all models were skipped - Thanks @GitRon

## [2.1.0] - 2024-08-20
### Changed
- Added support for `Django` version `5.1` - Thanks @GitRon
- Added `SCRUBBER_VALIDATION_WHITELIST` and excluded Django core test model - Thanks @GitRon

## [2.0.0] - 2024-06-27
### Changed
- **BREAKING**: Remove support for `Django` below version `4.2`
- **BREAKING**: Remove support for `Python` below version `3.8`
- **BREAKING**: Minimum required `Faker` version is now `20.0.0`, released 11/2023
- Added support for `Django` version `5.0`
- Added support for `Python` version `3.12`
- Add docker compose setup to run tests

## [1.3.0] - 2024-06-05
### Added
- Add support for regular expressions in `SCRUBBER_REQUIRED_FIELD_MODEL_WHITELIST` - Thanks @fbinz

## [1.2.2] - 2023-11-04
### Changed
- Set `default_auto_field` for `django-scrubber` to `django.db.models.AutoField` to prevent overrides from Django settings - Thanks @GitRon

## [1.2.1] - 2023-11-03
### Invalid

## [1.2.0] - 2023-04-01
### Changed
- Added scrubber validation - Thanks @GitRon
- Added strict mode - Thanks @GitRon

## [1.1.0] - 2022-07-11
### Changed
- Invalid fields on scrubbers will no longer raise exception but just trigger warnings now
- Author list completed

## [1.0.0] - 2022-07-11
### Changed
- Meta data for python package improved - Thanks @GitRon

## [0.9.0] - 2022-06-27
### Added
- Add functionality to scrub third party models like the Django user model, see https://github.com/RegioHelden/django-scrubber#scrubbing-third-party-models - Thanks @GitRon
- Add tests for Python 3.10 - Thanks @costela

## [0.8.0] - 2022-05-01
### Added
- Add `keep-sessions` argument to scrub_data command. Will NOT truncate all (by definition critical) session data. Thanks @GitRon
- Add `remove-fake-data` argument to scrub_data command. Will truncate the database table storing preprocessed data for the Faker library. Thanks @GitRon
- Add Django 3.2 and 4.0 to test matrix
### Changed
- Remove Python 3.6 from test matrix
- Remove Django 2.2 and 3.1 from test matrix

## [0.7.0] - 2022-02-24
### Changed
- Remove upper boundary for Faker as they release non-breaking major upgrades way too often, please pin a working release in your own app

## [0.6.2] - 2022-02-08
### Changed
- Support faker 12.x

## [0.6.1] - 2022-01-25
### Changed
- Support faker 11.x

## [0.6.0] - 2021-10-18
### Added
- Add support to override Faker locale in scrubber settings
### Changed
- Publish coverage only on main repository

## [0.5.6] - 2021-10-08
### Changed
- Pin psycopg2 in CI ti 2.8.6 as 2.9+ is incompatible with Django 2.2

## [0.5.5] - 2021-10-08
### Changed
- Support faker 9.x

## [0.5.4] - 2021-04-13
### Changed
- Support faker 8.x

## [0.5.3] - 2021-02-04
### Changed
- Support faker 6.x

## [0.5.2] - 2021-01-12
### Changed
- Add tests for Python 3.9
- Add tests for Django 3.1
- Support faker 5.x
- Update dev package requirements 

## [0.5.1] - 2020-10-16
### Changed
- Fix travis setup

## [0.5.0] - 2020-10-16
### Added
- Support for django-model-utils 4.x.x
### Changed
- Add compatibility for Faker 3.x.x, remove support for Faker < 0.8.0
- Remove support for Python 2.7 and 3.5
- Remove support for Django 1.x

## [0.4.4] - 2019-12-11
### Fixed
- add the same version restrictions on faker to setup.py

## [0.4.3] - 2019-12-04
### Added
- add empty and null scrubbers

### Changed
- make `Lorem` scrubber lazy, matching README

### Fixed
- set more stringent version requirements (faker >= 3 breaks builds)

## [0.4.1] - 2019-11-16
### Fixed
- correctly clear fake data model to fix successive calls to `scrub_data` (thanks [Benedikt Bauer](https://github.com/mastacheata))

## [0.4.0] - 2019-11-13
### Added
- `Faker` scrubber now supports passing arbitrary arguments to faker providers and also non-text fields (thanks [Benedikt Bauer](https://github.com/mastacheata) and [Ronny Vedrilla](https://github.com/GitRon))

## [0.3.1] - 2018-09-10
### Fixed
- [#9](https://github.com/RegioHelden/django-scrubber/pull/9) `Hash` scrubber choking on fields with `max_length=None` - Thanks to [Charlie Denton](https://github.com/meshy)

## [0.3.0] - 2018-09-06
### Added
- Finally added some basic tests (thanks [Marco De Felice](https://github.com/md-f))
- `Hash` scrubber can now also be used on sqlite

### Changed
- **BREAKING**: scrubbers that are lazily initialized now receive `Field` instances as parameters, instead of field
  names. If you have custom scrubbers depending on the previous behavior, these should be updated. Accessing the
  field's name from the object instance is trivial: `field_instance.name`. E.g.: if you have `some_field = MyCustomScrubber`
  in any of your models' `Scrubbers`, this class must accept a `Field` instance as first parameter.
  Note that explicitly intializing any of the built-in scrubbers with field names is still supported, so if you were
  just using built-in scrubbers, you should not be affected by this change.
- related to the above, `FuncField` derived classes can now do connection-based setup by implementing the
  `connection_setup` method. This is mostly useful for doing different things based on the DB vendor, and is used to
  implement `MD5()` on sqlite (see added feature above)
- Ignore proxy models when scrubbing (thanks [Marco De Felice](https://github.com/md-f))
- Expand tests to include python 3.7 and django 2.1

## [0.2.1] - 2018-08-14
### Added
- Option to scrub only one model from the management command
- Support loading additional faker providers by config setting SCRUBBER\_ADDITIONAL\_FAKER\_PROVIDERS

### Changed
- Switched changelog format to the one proposed on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)

## [0.2.0] - 2018-08-13
### Added
- scrubbers.Concat to make simple concatenation of scrubbers possible

## [0.1.4] - 2018-08-13
### Changed
- Make our README look beautiful on PyPI

## [0.1.3] - 2018-08-13
### Fixed
- [#1](https://github.com/RegioHelden/django-scrubber/pull/1) badly timed import - Thanks to [Charlie Denton](https://github.com/meshy)

## [0.1.2] - 2018-06-22
### Changed
- Use bumpversion and travis to make new releases
- rename project: django\_scrubber → django-scrubber

## [0.1.0] - 2018-06-22
### Added
- Initial release

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/regiohelden/django-scrubber",
    "name": "django-scrubber",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "RegioHelden GmbH",
    "author_email": "entwicklung@regiohelden.de",
    "download_url": "https://files.pythonhosted.org/packages/d5/cd/eb6a3ade089ab6a823d357d0cb811581c8dcf02f0e2d31f9a1748c2a5d77/django_scrubber-3.0.0.tar.gz",
    "platform": null,
    "description": "# Django Scrubber\n\n[![Build Status](https://github.com/RegioHelden/django-scrubber/workflows/Build/badge.svg)](https://github.com/RegioHelden/django-scrubber/actions)\n[![PyPI](https://img.shields.io/pypi/v/django-scrubber.svg)](https://pypi.org/project/django-scrubber/)\n[![Downloads](https://pepy.tech/badge/django-scrubber)](https://pepy.tech/project/django-scrubber)\n\n`django_scrubber` is a django app meant to help you anonymize your project's database data. It destructively alters data\ndirectly on the DB and therefore **should not be used on production**.\n\nThe main use case is providing developers with realistic data to use during development, without having to distribute\nyour customers' or users' potentially sensitive information.\nTo accomplish this, `django_scrubber` should be plugged in a step during the creation of your database dumps.\n\nSimply mark the fields you want to anonymize and call the `scrub_data` management command. Data will be replaced based\non different *scrubbers* (see below), which define how the anonymous content will be generated.\n\nIf you want to be sure that you don't forget any fields in the ongoing development progress, you can use the \nmanagement command `scrub_validation` in your CI/CD pipeline to check for any missing fields.\n\n## Installation\n\nSimply run:\n\n```\npip install django-scrubber\n```\n\nAnd add `django_scrubber` to your django `INSTALLED_APPS`. I.e.: in `settings.py` add:\n\n```\nINSTALLED_APPS = [\n  ...\n  'django_scrubber',\n  ...\n]\n```\n\n## Scrubbing data\n\nIn order to scrub data, i.e.: to replace DB data with anonymized versions, `django-scrubber` must know which models and\nfields it should act on, and how the data should be replaced.\n\nThere are a few different ways to select which data should be scrubbed, namely: explicitly per model field; or globally\nper name or field type.\n\nAdding scrubbers directly to model, matching scrubbers to fields by name:\n\n```python\nclass MyModel(Model):\n    somefield = CharField()\n\n    class Scrubbers:\n        somefield = scrubbers.Hash('somefield')\n```\n\nAdding scrubbers globally, either by field name or field type:\n\n```python\n# (in settings.py)\n\nSCRUBBER_GLOBAL_SCRUBBERS = {\n    'name': scrubbers.Hash,\n    EmailField: scrubbers.Hash,\n}\n```\n\nModel scrubbers override field-name scrubbers, which in turn override field-type scrubbers.\n\nTo disable global scrubbing in some specific model, simply set the respective field scrubber to `None`.\n\nScrubbers defined for non-existing fields will raise a warning but not fail the scubbing process.\n\nWhich mechanism will be used to scrub the selected data is determined by using one of the provided scrubbers\nin `django_scrubber.scrubbers`. See below for a list.\nAlternatively, values may be anything that can be used as a value in a `QuerySet.update()` call (like `Func` instances,\nstring literals, etc), or any `callable` that returns such an object when called with a `Field` object as argument.\n\nBy default, `django_scrubber` will affect all models from all registered apps. This may lead to issues with third-party\napps if the global scrubbers are too general. This can be avoided with the `SCRUBBER_APPS_LIST` setting. Using this, you\nmight for instance split your `INSTALLED_APPS` into multiple `SYSTEM_APPS` and `LOCAL_APPS`, then\nset `SCRUBBER_APPS_LIST = LOCAL_APPS`, to scrub only your own apps.\n\nFinally just run `./manage.py scrub_data` to **destructively** scrub the registered fields.\n\n### Arguments to the scrub_data command\n\n`--model` Scrub only a single model (format <app_label>.<model_name>)\n\n`--keep-sessions` Will NOT truncate all (by definition critical) session data.\n\n`--remove-fake-data` Will truncate the database table storing preprocessed data for the Faker library.\n\n## Built-In scrubbers\n\n### Empty/Null\n\nThe simplest scrubbers: replace the field's content with the empty string or `NULL`, respectively.\n\n```python\nclass Scrubbers:\n    somefield = scrubbers.Empty\n    someother = scrubbers.Null\n```\n\nThese scrubbers have no options.\n\n### Keeper\n\nWhen running the validation or want to work in strict mode, you maybe want to actively decide to keep certain data\ninstead of scrubbing them. In this case, you can just define `scrubbers.Keep`.\n\n```python\nclass Scrubbers:\n    non_critical_field = scrubbers.Keep\n```\n\nThese scrubber doesn't have any options.\n\n### Hash\n\nSimple hashing of content:\n\n```python\nclass Scrubbers:\n    somefield = scrubbers.Hash  # will use the field itself as source\n    someotherfield = scrubbers.Hash('somefield')  # can optionally pass a different field name as hashing source\n```\n\nCurrently, this uses the MD5 hash which is supported in a wide variety of DB engines. Additionally, since security is\nnot the main objective, a shorter hash length has a lower risk of being longer than whatever field it is supposed to\nreplace.\n\n### Lorem\n\nSimple scrubber meant to replace `TextField` with a static block of text. Has no options.\n\n```python\nclass Scrubbers:\n    somefield = scrubbers.Lorem\n```\n\n### Concat\n\nWrapper around `django.db.functions.Concat` to enable simple concatenation of scrubbers. This is useful if you want to\nensure a fields uniqueness through composition of, for instance, the `Hash` and `Faker` (see below) scrubbers.\n\nThe following will generate random email addresses by hashing the user-part and using `faker` for the domain part:\n\n```python\nclass Scrubbers:\n    email = scrubbers.Concat(scrubbers.Hash('email'), models.Value('@'), scrubbers.Faker('domain_name'))\n```\n\n### Faker\n\nReplaces content with the help of [faker](https://pypi.python.org/pypi/Faker).\n\n```python\nclass Scrubbers:\n    first_name = scrubbers.Faker('first_name')\n    last_name = scrubbers.Faker('last_name')\n    past_date = scrubbers.Faker('past_date', start_date=\"-30d\", tzinfo=None)\n```\n\nThe replacements are done on the database-level and should therefore be able to cope with large amounts of data with\nreasonable performance.\n\nThe `Faker` scrubber requires at least one argument: the faker provider used to generate random data.\nAll [faker providers](https://faker.readthedocs.io/en/latest/providers.html) are supported, and you can also register\nyour own custom providers.<br />\nAny remaining arguments will be passed through to that provider. Please refer to the faker docs if a provider accepts\narguments and what to do with them.\n\n#### Locales\n\nFaker will be initialized with the current django `LANGUAGE_CODE` and will populate the DB with localized data. If you\nwant localized scrubbing, simply set it to some other value.\n\n#### Idempotency\n\nBy default, the faker instance used to populate the DB uses a fixed random seed, in order to ensure different scrubbings\nof the same data generate the same output. This is particularly useful if the scrubbed data is imported as a dump by\ndevelopers, since changing data during troubleshooting would otherwise be confusing.\n\nThis behaviour can be changed by setting `SCRUBBER_RANDOM_SEED=None`, which ensures every scrubbing will generate random\nsource data.\n\n#### Limitations\n\nScrubbing unique fields may lead to `IntegrityError`s, since there is no guarantee that the random content will not be\nrepeated. Playing with different settings for `SCRUBBER_RANDOM_SEED` and `SCRUBBER_ENTRIES_PER_PROVIDER` may alleviate\nthe problem.\nUnfortunately, for performance reasons, the source data for scrubbing with faker is added to the database, and\narbitrarily increasing `SCRUBBER_ENTRIES_PER_PROVIDER` will significantly slow down scrubbing (besides still not\nguaranteeing uniqueness).\n\nWhen using `django < 2.1` and working on `sqlite` a bug within django causes field-specific scrubbing (\ne.g. `date_object`) to fail. Please consider using a different database backend or upgrade to the latest django version.\n\n## Scrubbing third-party models\n\nSometimes you just don't have control over some code, but you still want to scrub the data of a given model.\n\nA good example is the Django user model. It contains sensitive data, and you would have to overwrite the whole model\njust to add the scrubber metaclass.\n\nThat's the way to go:\n\n1. Define your Scrubber class **somewhere** in your codebase (like a `scrubbers.py`)\n\n```python\n# scrubbers.py\nclass UserScrubbers:\n    scrubbers.Faker('de_DE')\n    first_name = scrubbers.Faker('first_name')\n    last_name = scrubbers.Faker('last_name')\n    username = scrubbers.Faker('uuid4')\n    password = scrubbers.Faker('sha1')\n    last_login = scrubbers.Null\n    email = scrubbers.Concat(first_name, models.Value('.'), last_name, models.Value('@'),\n                             models.Value(settings.SCRUBBER_DOMAIN))\n````\n\n2. Set up a mapping between your third-party model and your scrubber class\n\n```python\n# settings.py\nSCRUBBER_MAPPING = {\n    \"auth.User\": \"apps.account.scrubbers.UserScrubbers\",\n}\n```\n\n## Settings\n\n### `SCRUBBER_GLOBAL_SCRUBBERS`:\n\nDictionary of global scrubbers. Keys should be either field names as strings or field type classes. Values should be one\nof the scrubbers provided in `django_scrubber.scrubbers`.\n\nExample:\n\n```python\nSCRUBBER_GLOBAL_SCRUBBERS = {\n    'name': scrubbers.Hash,\n    EmailField: scrubbers.Hash,\n}\n```\n\n### `SCRUBBER_RANDOM_SEED`:\n\nThe seed used when generating random content by the Faker scrubber. Setting this to `None` means each scrubbing will\ngenerate different data.\n\n(default: `42`)\n\n### `SCRUBBER_ENTRIES_PER_PROVIDER`:\n\nNumber of entries to use as source for Faker scrubber. Increasing this value will increase the randomness of generated\ndata, but decrease performance.\n\n(default: `1000`)\n\n### `SCRUBBER_SKIP_UNMANAGED`:\n\nDo not attempt to scrub models which are not managed by the ORM.\n\n(default: `True`)\n\n### `SCRUBBER_APPS_LIST`:\n\nOnly scrub models belonging to these specific django apps. If unset, will scrub all installed apps.\n\n(default: `None`)\n\n### `SCRUBBER_ADDITIONAL_FAKER_PROVIDERS`:\n\nAdd additional fake providers to be used by Faker. Must be noted as full dotted path to the provider class.\n\n(default: `{*()}`, empty set)\n\n### `SCRUBBER_FAKER_LOCALE`:\n\nSet an alternative locale for Faker used during the scrubbing process.\n\n(default: `None`, falls back to Django's default locale)\n\n### `SCRUBBER_MAPPING`:\n\nDefine a class and a mapper which does not have to live inside the given model. Useful, if you have no control over the\nmodels code you'd like to scrub.\n\n````python\nSCRUBBER_MAPPING = {\n    \"auth.User\": \"my_app.scrubbers.UserScrubbers\",\n}\n````\n\n(default: `{}`)\n\n### `SCRUBBER_STRICT_MODE`:\n\nWhen strict mode is activated, you have to define a scrubbing policy for every field of every type defined in\n`SCRUBBER_REQUIRED_FIELD_TYPES`. If you have unscrubbed fields and this flag is active, you can't run\n`python manage.py scrub_data`.\n\n(default: `False`)\n\n### `SCRUBBER_REQUIRED_FIELD_TYPES`:\n\nDefaults to all text-based Django model fields. Usually, privacy-relevant data is only stored in text-fields, numbers\nand booleans (usually) can't contain sensitive personal data. These fields will be checked when running\n`python manage.py scrub_validation`.\n\n(default: `(models.CharField, models.TextField, models.URLField, models.JSONField, models.GenericIPAddressField,\n           models.EmailField,)`)\n\n### `SCRUBBER_REQUIRED_FIELD_MODEL_WHITELIST`:\n\nWhitelists a list of models which will not be checked during `scrub_validation` and when \nactivating the strict mode. Defaults to the non-privacy-related Django base models.\nItems can either be full model names (e.g. `auth.Group`) or regular expression patterns matching\nagainst the full model name (e.g. `re.compile(auth.*)` to whitelist all auth models).\n\n(default: `('auth.Group', 'auth.Permission', 'contenttypes.ContentType', 'sessions.Session', 'sites.Site', \n'django_scrubber.FakeData', 'db.TestModel',)`)\n\n(default: {})\n\n## Logging\n\nScrubber uses the default django logger. The logger name is ``django_scrubber.scrubbers``.\nSo if you want to log - for example - to the console, you could set up the logger like this:\n\n````\nLOGGING['loggers']['django_scrubber'] = {\n    'handlers': ['console'],\n    'propagate': True,\n    'level': 'DEBUG',\n}\n````\n\n## Making a new release\n\n[bumpversion](https://github.com/peritus/bumpversion) is used to manage releases.\n\nAdd your changes to the [CHANGELOG](./CHANGELOG.md) and run `bumpversion <major|minor|patch>`, then push (including\ntags)\n\n\n# Changelog\nAll notable changes to this project will be documented in this file.\n\nThe format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)\nand this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).\n\n<!--\n## [Unreleased]\n-->\n\n## [3.0.0] - 2024-09-10\n### Breaking\n- Removed `SCRUBBER_VALIDATION_WHITELIST` in favour of `SCRUBBER_REQUIRED_FIELD_MODEL_WHITELIST` - Thanks @GitRon\n### Changed\n- Added Django test model `db.TestModel` to default whitelist of `SCRUBBER_REQUIRED_FIELD_MODEL_WHITELIST` - Thanks @GitRon\n- Removed support for the `mock` package in unit tests\n- Adjusted some default settings\n\n## [2.1.1] - 2024-08-20\n### Changed\n- Fixed an issue where the management command `scrub_validation` could fail even though all models were skipped - Thanks @GitRon\n\n## [2.1.0] - 2024-08-20\n### Changed\n- Added support for `Django` version `5.1` - Thanks @GitRon\n- Added `SCRUBBER_VALIDATION_WHITELIST` and excluded Django core test model - Thanks @GitRon\n\n## [2.0.0] - 2024-06-27\n### Changed\n- **BREAKING**: Remove support for `Django` below version `4.2`\n- **BREAKING**: Remove support for `Python` below version `3.8`\n- **BREAKING**: Minimum required `Faker` version is now `20.0.0`, released 11/2023\n- Added support for `Django` version `5.0`\n- Added support for `Python` version `3.12`\n- Add docker compose setup to run tests\n\n## [1.3.0] - 2024-06-05\n### Added\n- Add support for regular expressions in `SCRUBBER_REQUIRED_FIELD_MODEL_WHITELIST` - Thanks @fbinz\n\n## [1.2.2] - 2023-11-04\n### Changed\n- Set `default_auto_field` for `django-scrubber` to `django.db.models.AutoField` to prevent overrides from Django settings - Thanks @GitRon\n\n## [1.2.1] - 2023-11-03\n### Invalid\n\n## [1.2.0] - 2023-04-01\n### Changed\n- Added scrubber validation - Thanks @GitRon\n- Added strict mode - Thanks @GitRon\n\n## [1.1.0] - 2022-07-11\n### Changed\n- Invalid fields on scrubbers will no longer raise exception but just trigger warnings now\n- Author list completed\n\n## [1.0.0] - 2022-07-11\n### Changed\n- Meta data for python package improved - Thanks @GitRon\n\n## [0.9.0] - 2022-06-27\n### Added\n- Add functionality to scrub third party models like the Django user model, see https://github.com/RegioHelden/django-scrubber#scrubbing-third-party-models - Thanks @GitRon\n- Add tests for Python 3.10 - Thanks @costela\n\n## [0.8.0] - 2022-05-01\n### Added\n- Add `keep-sessions` argument to scrub_data command. Will NOT truncate all (by definition critical) session data. Thanks @GitRon\n- Add `remove-fake-data` argument to scrub_data command. Will truncate the database table storing preprocessed data for the Faker library. Thanks @GitRon\n- Add Django 3.2 and 4.0 to test matrix\n### Changed\n- Remove Python 3.6 from test matrix\n- Remove Django 2.2 and 3.1 from test matrix\n\n## [0.7.0] - 2022-02-24\n### Changed\n- Remove upper boundary for Faker as they release non-breaking major upgrades way too often, please pin a working release in your own app\n\n## [0.6.2] - 2022-02-08\n### Changed\n- Support faker 12.x\n\n## [0.6.1] - 2022-01-25\n### Changed\n- Support faker 11.x\n\n## [0.6.0] - 2021-10-18\n### Added\n- Add support to override Faker locale in scrubber settings\n### Changed\n- Publish coverage only on main repository\n\n## [0.5.6] - 2021-10-08\n### Changed\n- Pin psycopg2 in CI ti 2.8.6 as 2.9+ is incompatible with Django 2.2\n\n## [0.5.5] - 2021-10-08\n### Changed\n- Support faker 9.x\n\n## [0.5.4] - 2021-04-13\n### Changed\n- Support faker 8.x\n\n## [0.5.3] - 2021-02-04\n### Changed\n- Support faker 6.x\n\n## [0.5.2] - 2021-01-12\n### Changed\n- Add tests for Python 3.9\n- Add tests for Django 3.1\n- Support faker 5.x\n- Update dev package requirements \n\n## [0.5.1] - 2020-10-16\n### Changed\n- Fix travis setup\n\n## [0.5.0] - 2020-10-16\n### Added\n- Support for django-model-utils 4.x.x\n### Changed\n- Add compatibility for Faker 3.x.x, remove support for Faker < 0.8.0\n- Remove support for Python 2.7 and 3.5\n- Remove support for Django 1.x\n\n## [0.4.4] - 2019-12-11\n### Fixed\n- add the same version restrictions on faker to setup.py\n\n## [0.4.3] - 2019-12-04\n### Added\n- add empty and null scrubbers\n\n### Changed\n- make `Lorem` scrubber lazy, matching README\n\n### Fixed\n- set more stringent version requirements (faker >= 3 breaks builds)\n\n## [0.4.1] - 2019-11-16\n### Fixed\n- correctly clear fake data model to fix successive calls to `scrub_data` (thanks [Benedikt Bauer](https://github.com/mastacheata))\n\n## [0.4.0] - 2019-11-13\n### Added\n- `Faker` scrubber now supports passing arbitrary arguments to faker providers and also non-text fields (thanks [Benedikt Bauer](https://github.com/mastacheata) and [Ronny Vedrilla](https://github.com/GitRon))\n\n## [0.3.1] - 2018-09-10\n### Fixed\n- [#9](https://github.com/RegioHelden/django-scrubber/pull/9) `Hash` scrubber choking on fields with `max_length=None` - Thanks to [Charlie Denton](https://github.com/meshy)\n\n## [0.3.0] - 2018-09-06\n### Added\n- Finally added some basic tests (thanks [Marco De Felice](https://github.com/md-f))\n- `Hash` scrubber can now also be used on sqlite\n\n### Changed\n- **BREAKING**: scrubbers that are lazily initialized now receive `Field` instances as parameters, instead of field\n  names. If you have custom scrubbers depending on the previous behavior, these should be updated. Accessing the\n  field's name from the object instance is trivial: `field_instance.name`. E.g.: if you have `some_field = MyCustomScrubber`\n  in any of your models' `Scrubbers`, this class must accept a `Field` instance as first parameter.\n  Note that explicitly intializing any of the built-in scrubbers with field names is still supported, so if you were\n  just using built-in scrubbers, you should not be affected by this change.\n- related to the above, `FuncField` derived classes can now do connection-based setup by implementing the\n  `connection_setup` method. This is mostly useful for doing different things based on the DB vendor, and is used to\n  implement `MD5()` on sqlite (see added feature above)\n- Ignore proxy models when scrubbing (thanks [Marco De Felice](https://github.com/md-f))\n- Expand tests to include python 3.7 and django 2.1\n\n## [0.2.1] - 2018-08-14\n### Added\n- Option to scrub only one model from the management command\n- Support loading additional faker providers by config setting SCRUBBER\\_ADDITIONAL\\_FAKER\\_PROVIDERS\n\n### Changed\n- Switched changelog format to the one proposed on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)\n\n## [0.2.0] - 2018-08-13\n### Added\n- scrubbers.Concat to make simple concatenation of scrubbers possible\n\n## [0.1.4] - 2018-08-13\n### Changed\n- Make our README look beautiful on PyPI\n\n## [0.1.3] - 2018-08-13\n### Fixed\n- [#1](https://github.com/RegioHelden/django-scrubber/pull/1) badly timed import - Thanks to [Charlie Denton](https://github.com/meshy)\n\n## [0.1.2] - 2018-06-22\n### Changed\n- Use bumpversion and travis to make new releases\n- rename project: django\\_scrubber \u2192 django-scrubber\n\n## [0.1.0] - 2018-06-22\n### Added\n- Initial release\n",
    "bugtrack_url": null,
    "license": "BSD",
    "summary": "Data Anonymizer for Django",
    "version": "3.0.0",
    "project_urls": {
        "Bugtracker": "https://github.com/RegioHelden/django-scrubber/issues",
        "Changelog": "https://github.com/RegioHelden/django-scrubber/blob/master/CHANGELOG.md",
        "Documentation": "https://github.com/RegioHelden/django-scrubber/blob/master/README.md",
        "Homepage": "https://github.com/regiohelden/django-scrubber",
        "Maintained by": "https://github.com/RegioHelden/django-scrubber/blob/master/AUTHORS.md"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1663a74fe2a3d7bdfe36781f4941fcef202794c4ad43cafe1e8a3665a1f91444",
                "md5": "82f6ddc28e1bbbb4dd804df5d17f1b1b",
                "sha256": "d115189976848e6f13442405bcdcdc7ca16e4da76ddd64bd5f574250162c80e0"
            },
            "downloads": -1,
            "filename": "django_scrubber-3.0.0-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "82f6ddc28e1bbbb4dd804df5d17f1b1b",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 20329,
            "upload_time": "2024-09-10T09:11:22",
            "upload_time_iso_8601": "2024-09-10T09:11:22.878650Z",
            "url": "https://files.pythonhosted.org/packages/16/63/a74fe2a3d7bdfe36781f4941fcef202794c4ad43cafe1e8a3665a1f91444/django_scrubber-3.0.0-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d5cdeb6a3ade089ab6a823d357d0cb811581c8dcf02f0e2d31f9a1748c2a5d77",
                "md5": "731de8368ce134a22d672ab545c42c4e",
                "sha256": "6a9d15469af55070396e593f621138606f1ea76752c5395d9bd3c44b0ba3b176"
            },
            "downloads": -1,
            "filename": "django_scrubber-3.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "731de8368ce134a22d672ab545c42c4e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 27646,
            "upload_time": "2024-09-10T09:11:24",
            "upload_time_iso_8601": "2024-09-10T09:11:24.721549Z",
            "url": "https://files.pythonhosted.org/packages/d5/cd/eb6a3ade089ab6a823d357d0cb811581c8dcf02f0e2d31f9a1748c2a5d77/django_scrubber-3.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-10 09:11:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "regiohelden",
    "github_project": "django-scrubber",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "requirements": [
        {
            "name": "coverage",
            "specs": [
                [
                    "==",
                    "6.5.0"
                ]
            ]
        },
        {
            "name": "coveralls",
            "specs": [
                [
                    "==",
                    "3.3.1"
                ]
            ]
        },
        {
            "name": "factory_boy",
            "specs": [
                [
                    "==",
                    "3.3.0"
                ]
            ]
        },
        {
            "name": "faker",
            "specs": [
                [
                    "==",
                    "26.0.0"
                ]
            ]
        },
        {
            "name": "flake8",
            "specs": [
                [
                    ">=",
                    "7.1.0"
                ]
            ]
        },
        {
            "name": "mock",
            "specs": [
                [
                    "==",
                    "5.1.0"
                ]
            ]
        }
    ],
    "lcname": "django-scrubber"
}
        
Elapsed time: 0.45389s