pynonymizer

Name	pynonymizer JSON
Version	2.5.0 JSON
	download
home_page	https://github.com/rwnx/pynonymizer
Summary	An anonymization tool for production databases
upload_time	2024-12-27 12:24:38
maintainer	None
docs_url	None
author	Rowan Twell
requires_python	>3.9.0
license	MIT
keywords	anonymization gdpr database mysql
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # `pynonymizer` [![pynonymizer on PyPI](https://img.shields.io/pypi/v/pynonymizer)](https://pypi.org/project/pynonymizer/) [![Downloads](https://static.pepy.tech/badge/pynonymizer)](https://pepy.tech/project/pynonymizer) ![License](https://img.shields.io/pypi/l/pynonymizer)

# pynonymizer

pynonymizer is a tool for anonymizing sensitive production database dumps, allowing you to create realistic test datasets while maintaining GDPR/Data Protection compliance. It replaces personally identifiable information (PII) in your database with random, yet realistic data, using the Faker library and other functions.

Key features:

- Supports MySQL, PostgreSQL, and MSSQL databases
- Accepts various input formats (SQL, compressed files)
- Generates anonymized output in multiple formats
- Flexible data generation strategies for different use cases
- Easy to use command-line interface and Python library

With pynonymizer, you can safely share production database copies with developers and testers, enabling better staging environments, integration tests, and database migration simulations, without compromising user privacy.


## How does it work?
`pynonymizer` replaces personally identifiable data in your database with **realistic** pseudorandom data, from the `Faker` library or from other functions.
There are a wide variety of data types available which should suit the column in question, for example:

* `unique_email`
* `company`
* `file_path`
* `[...]`

Pynonymizer's main data replacement mechanism `fake_update` is a random selection from a small pool of data (`--seed-rows` controls the available Faker data). This process is chosen for compatibility and speed of operation, but does not guarantee uniqueness. 
This may or may not suit your exact use-case. For a full list of data generation strategies, see the docs on [strategyfiles](https://github.com/rwnx/pynonymizer/blob/main/doc/strategyfiles.md)

### Examples

You can see strategyfile examples for existing databases, in the the [examples folder](https://github.com/rwnx/pynonymizer/blob/main/examples).

### Process outline

1. Restore from dumpfile to temporary database.
1. Anonymize temporary database with strategy.
1. Dump resulting data to file.
1. Drop temporary database.

If this workflow doesnt work for you, see [process control](https://github.com/rwnx/pynonymizer/blob/main/doc/process-control.md) to see if it can be adjusted to suit your needs.

### mysql
* `mysql`/`mysqldump` Must be in $PATH
* Local or remote mysql >= 5.5
* Supported Inputs:
  * Plain SQL over stdout
  * Plain SQL file `.sql`
  * GZip-compressed SQL file `.gz` 
* Supported Outputs:
  * Plain SQL over stdout
  * Plain SQL file `.sql`
  * GZip-compressed SQL file `.gz` 
  * LZMA-compressed SQL file `.xz`

### mssql
* Requires extra dependencies: install package `pynonymizer[mssql]`
* MSSQL >= 2008
* For `RESTORE_DB`/`DUMP_DB` operations, the database server *must* be running
  locally with pynonymizer. This is because MSSQL `RESTORE` and `BACKUP` instructions
  are received by the database, so piping a local backup to a remote server is not possible.
* The anonymize process can be performed on remote servers, but you are responsible for creating/managing the target database.
* Supported Inputs:
  * Local backup file
* Supported Outputs:
  * Local backup file

### postgres
* `psql`/`pg_dump` Must be in $PATH
* Local or remote postgres server
* Supported Inputs:
  * Plain SQL over stdout
  * Plain SQL file `.sql`
  * GZip-compressed SQL file `.gz` 
* Supported Outputs:
  * Plain SQL over stdout
  * Plain SQL file `.sql`
  * GZip-compressed SQL file `.gz` 
  * LZMA-compressed SQL file `.xz`

# Getting Started

## Usage
### CLI
1. Write a [strategyfile](https://github.com/rwnx/pynonymizer/blob/main/doc/strategyfiles.md) for your database
1. Check out the help for a description of options `pynonymizer --help`
1. Start Anonymizing!

### Docker

![Docker Image Version](https://img.shields.io/docker/v/rwnxt/pynonymizer?label=Docker)


pynonymizer is available as a docker image so that you dont have to install the client tools for your database. 

See https://hub.docker.com/repository/docker/rwnxt/pynonymizer

```sh
# As pynonymizer depends on strategyfiles, you'll need to create a file mount so the file can be read.
docker run --mount type=bind,source=./strategyfile.yml,target=/tmp/strategyfile.yml rwnxt/pynonymizer -s /tmp/strategyfile.yml --db-host [...]
```

### Package
Pynonymizer can also be invoked programmatically / from other python code. See the module entrypoint [pynonymizer](pynonymizer/__init__.py) or [pynonymizer/pynonymize.py](pynonymizer/pynonymize.py)

```python
import pynonymizer

pynonymizer.run(input_path="./backup.sql", strategyfile_path="./strategy.yml" [...] )
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/rwnx/pynonymizer",
    "name": "pynonymizer",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">3.9.0",
    "maintainer_email": null,
    "keywords": "anonymization gdpr database mysql",
    "author": "Rowan Twell",
    "author_email": "rowantwell@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/09/e8/9d1ed8e2a3ea849bddb96daff0b6108010d8a9cb9b2ff212f231d7258e24/pynonymizer-2.5.0.tar.gz",
    "platform": null,
    "description": "# `pynonymizer` [![pynonymizer on PyPI](https://img.shields.io/pypi/v/pynonymizer)](https://pypi.org/project/pynonymizer/) [![Downloads](https://static.pepy.tech/badge/pynonymizer)](https://pepy.tech/project/pynonymizer) ![License](https://img.shields.io/pypi/l/pynonymizer)\n\n# pynonymizer\n\npynonymizer is a tool for anonymizing sensitive production database dumps, allowing you to create realistic test datasets while maintaining GDPR/Data Protection compliance. It replaces personally identifiable information (PII) in your database with random, yet realistic data, using the Faker library and other functions.\n\nKey features:\n\n- Supports MySQL, PostgreSQL, and MSSQL databases\n- Accepts various input formats (SQL, compressed files)\n- Generates anonymized output in multiple formats\n- Flexible data generation strategies for different use cases\n- Easy to use command-line interface and Python library\n\nWith pynonymizer, you can safely share production database copies with developers and testers, enabling better staging environments, integration tests, and database migration simulations, without compromising user privacy.\n\n\n## How does it work?\n`pynonymizer` replaces personally identifiable data in your database with **realistic** pseudorandom data, from the `Faker` library or from other functions.\nThere are a wide variety of data types available which should suit the column in question, for example:\n\n* `unique_email`\n* `company`\n* `file_path`\n* `[...]`\n\nPynonymizer's main data replacement mechanism `fake_update` is a random selection from a small pool of data (`--seed-rows` controls the available Faker data). This process is chosen for compatibility and speed of operation, but does not guarantee uniqueness. \nThis may or may not suit your exact use-case. For a full list of data generation strategies, see the docs on [strategyfiles](https://github.com/rwnx/pynonymizer/blob/main/doc/strategyfiles.md)\n\n### Examples\n\nYou can see strategyfile examples for existing databases, in the the [examples folder](https://github.com/rwnx/pynonymizer/blob/main/examples).\n\n### Process outline\n\n1. Restore from dumpfile to temporary database.\n1. Anonymize temporary database with strategy.\n1. Dump resulting data to file.\n1. Drop temporary database.\n\nIf this workflow doesnt work for you, see [process control](https://github.com/rwnx/pynonymizer/blob/main/doc/process-control.md) to see if it can be adjusted to suit your needs.\n\n### mysql\n* `mysql`/`mysqldump` Must be in $PATH\n* Local or remote mysql >= 5.5\n* Supported Inputs:\n  * Plain SQL over stdout\n  * Plain SQL file `.sql`\n  * GZip-compressed SQL file `.gz` \n* Supported Outputs:\n  * Plain SQL over stdout\n  * Plain SQL file `.sql`\n  * GZip-compressed SQL file `.gz` \n  * LZMA-compressed SQL file `.xz`\n\n### mssql\n* Requires extra dependencies: install package `pynonymizer[mssql]`\n* MSSQL >= 2008\n* For `RESTORE_DB`/`DUMP_DB` operations, the database server *must* be running\n  locally with pynonymizer. This is because MSSQL `RESTORE` and `BACKUP` instructions\n  are received by the database, so piping a local backup to a remote server is not possible.\n* The anonymize process can be performed on remote servers, but you are responsible for creating/managing the target database.\n* Supported Inputs:\n  * Local backup file\n* Supported Outputs:\n  * Local backup file\n\n### postgres\n* `psql`/`pg_dump` Must be in $PATH\n* Local or remote postgres server\n* Supported Inputs:\n  * Plain SQL over stdout\n  * Plain SQL file `.sql`\n  * GZip-compressed SQL file `.gz` \n* Supported Outputs:\n  * Plain SQL over stdout\n  * Plain SQL file `.sql`\n  * GZip-compressed SQL file `.gz` \n  * LZMA-compressed SQL file `.xz`\n\n# Getting Started\n\n## Usage\n### CLI\n1. Write a [strategyfile](https://github.com/rwnx/pynonymizer/blob/main/doc/strategyfiles.md) for your database\n1. Check out the help for a description of options `pynonymizer --help`\n1. Start Anonymizing!\n\n### Docker\n\n![Docker Image Version](https://img.shields.io/docker/v/rwnxt/pynonymizer?label=Docker)\n\n\npynonymizer is available as a docker image so that you dont have to install the client tools for your database. \n\nSee https://hub.docker.com/repository/docker/rwnxt/pynonymizer\n\n```sh\n# As pynonymizer depends on strategyfiles, you'll need to create a file mount so the file can be read.\ndocker run --mount type=bind,source=./strategyfile.yml,target=/tmp/strategyfile.yml rwnxt/pynonymizer -s /tmp/strategyfile.yml --db-host [...]\n```\n\n### Package\nPynonymizer can also be invoked programmatically / from other python code. See the module entrypoint [pynonymizer](pynonymizer/__init__.py) or [pynonymizer/pynonymize.py](pynonymizer/pynonymize.py)\n\n```python\nimport pynonymizer\n\npynonymizer.run(input_path=\"./backup.sql\", strategyfile_path=\"./strategy.yml\" [...] )\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "An anonymization tool for production databases",
    "version": "2.5.0",
    "project_urls": {
        "Homepage": "https://github.com/rwnx/pynonymizer"
    },
    "split_keywords": [
        "anonymization",
        "gdpr",
        "database",
        "mysql"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "133eecd8b213a28945ad7e502f62047718a5b2e1bc1d873db995497717d683c4",
                "md5": "b7a1c8731013ccb9747943675b431e9a",
                "sha256": "31372212e6d6e9e273cb8f90c6bb87be4c20ea94121e507ca7e7bb0c43ff6c04"
            },
            "downloads": -1,
            "filename": "pynonymizer-2.5.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b7a1c8731013ccb9747943675b431e9a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">3.9.0",
            "size": 39068,
            "upload_time": "2024-12-27T12:24:35",
            "upload_time_iso_8601": "2024-12-27T12:24:35.869256Z",
            "url": "https://files.pythonhosted.org/packages/13/3e/ecd8b213a28945ad7e502f62047718a5b2e1bc1d873db995497717d683c4/pynonymizer-2.5.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "09e89d1ed8e2a3ea849bddb96daff0b6108010d8a9cb9b2ff212f231d7258e24",
                "md5": "7947ff865247f143302485c9ca586af3",
                "sha256": "fa6a68a4c3f898ee15446aeb86948bf7bbe27f5987b314e9b41de37cd5bbd519"
            },
            "downloads": -1,
            "filename": "pynonymizer-2.5.0.tar.gz",
            "has_sig": false,
            "md5_digest": "7947ff865247f143302485c9ca586af3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">3.9.0",
            "size": 30482,
            "upload_time": "2024-12-27T12:24:38",
            "upload_time_iso_8601": "2024-12-27T12:24:38.106097Z",
            "url": "https://files.pythonhosted.org/packages/09/e8/9d1ed8e2a3ea849bddb96daff0b6108010d8a9cb9b2ff212f231d7258e24/pynonymizer-2.5.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-27 12:24:38",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "rwnx",
    "github_project": "pynonymizer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pynonymizer"
}

Rowan Twell