presidio-anonymizer


Namepresidio-anonymizer JSON
Version 2.2.353 PyPI version JSON
download
home_pagehttps://github.com/microsoft/presidio
SummaryPersidio Anonymizer package - replaces analyzed text with desired values.
upload_time2024-02-12 15:44:39
maintainer
docs_urlNone
author
requires_python>=3.5
licenseMIT license
keywords presidio_anonymizer
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Presidio anonymizer

## Description

The Presidio anonymizer is a Python based module for anonymizing detected PII text
entities with desired values.

![Anonymizer Design](../docs/assets/anonymizer-design.png)

### Deploy Presidio anonymizer to Azure

Use the following button to deploy presidio anonymizer to your Azure subscription.
 
[![Deploy to Azure](https://aka.ms/deploytoazurebutton)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2Fmicrosoft%2Fpresidio%2Fmain%2Fpresidio-anonymizer%2Fdeploytoazure.json)


The Presidio-Anonymizer package contains both Anonymizers and Deanonymizers.
- *Anonymizers* are used to replace a PII entity text with some other value.
- *Deanonymizers* are used to revert the anonymization operation. 
  For example, to decrypt an encrypted text.

### Anonymizer

Presidio anonymizer comes by default with the following anonymizers:

-   **Replace**: Replaces the PII with desired value.
    -   Parameters: `new_value` - replaces existing text with the given value.
        If `new_value` is not supplied or empty, default behavior will be: <entity_type>
        e.g: <PHONE_NUMBER>

-   **Redact**: Removes the PII completely from text.
    -   Parameters: None
-   **Hash**: Hashes the PII using either sha256, sha512 or md5. 
    -   Parameters:
        - `hash_type`: Sets the type of hashing. 
          Can be either `sha256`, `sha512` or `md5`.
          The default hash type is `sha256`.
-   **Mask**: Replaces the PII with a sequence of a given character.
    -   Parameters:

        -   `chars_to_mask`: The amount of characters out of the PII that should be
            replaced.
        -   `masking_char`: The character to be replaced with.
        -   `from_end`: Whether to mask the PII from it's end.
    
-   **Encrypt**: Encrypt the PII entity text and replace the original with the encrypted string. 
-   **Custom**: Replace the PII with the result of the function executed on the PII string.
    - Parameters: `lambda`: Lambda function to execute on the PII string.
    The lambda return type must be a string.


The **Anonymizer** default setting is to use the Advanced Encryption Standard (AES) as the encryption algorithm, also known as Rijndael. 
   
-  Parameters:
    - `key`: A cryptographic key used for the encryption. 
      The length of the key needs to be of 128, 192 or 256 bits, in a string format.

Note: If the default anonymizer is not provided, 
the default anonymizer is "replace" for all entities. 
The replacing value will be the entity type e.g.: <PHONE_NUMBER>

#### Handling overlaps between entities

As the input text could potentially have overlapping PII entities, there are different
anonymization scenarios:

-   **No overlap (single PII)**: When there is no overlap in spans of entities, 
    Presidio Anonymizer uses a given or default anonymization operator to anonymize 
    and replace the PII text entity.
-   **Full overlap of PII entity spans**: When entities have overlapping substrings,  
    the PII with the higher score will be taken. 
    Between PIIs with identical scores, the selection is arbitrary.
-   **One PII is contained in another**: Presidio Anonymizer will use the PII with the larger text even if it's score is lower.
-   **Partial intersection**: Presidio Anonymizer will anonymize each individually and will return a concatenation of the anonymized text. 
    For example: 
    For the text
    ```
    I'm George Washington Square Park.
    ``` 
    Assuming one entity is `George Washington` and the other is `Washington State Park` 
    and assuming the default anonymizer, the result would be 
    ```
    I'm <PERSON><LOCATION>.
    ```

Additional examples for overlapping PII scenarios:

Text:
```
My name is Inigo Montoya. You Killed my Father. Prepare to die. BTW my number is:
03-232323.
```

-   No overlaps: Assuming only `Inigo` is recognized as NAME:
    ```
    My name is <NAME> Montoya. You Killed my Father. Prepare to die. BTW my number is:
    03-232323.
    ```
-   Full overlap: Assuming the number is recognized as PHONE_NUMBER with score of 0.7 and as SSN
    with score of 0.6, the higher score would count:
    ```
    My name is Inigo Montoya. You Killed my Father. Prepare to die. BTW my number is: <
    PHONE_NUMBER>.
    ```
-   One PII is contained is another: Assuming Inigo is recognized as FIRST_NAME and Inigo Montoya
    was recognized as NAME, the larger one will be used:
    ```
    My name is <NAME>. You Killed my Father. Prepare to die. BTW my number is: 03-232323.
    ```
-   Partial intersection: Assuming the number 03-2323 is recognized as a PHONE_NUMBER but 232323
    is recognized as SSN:
    ```
    My name is Inigo Montoya. You Killed my Father. Prepare to die. BTW my number is: <
    PHONE_NUMBER><SSN>.
    ```

### Deanonymizer

Presidio deanonymizer currently contains one operator:

-   **Decrypt**: Replace the encrypted text with decrypted text. 
    Uses Advanced Encryption Standard (AES) as the encryption algorithm, also known as Rijndael. 
    -  Parameters:
        -   `key` - a cryptographic key used for the encryption. 
            The length of the key needs to be of 128, 192 or 256 bits, in a string format.

Please notice: you can use "DEFAULT" as an operator key to define an operator over all entities.

## Installation

### As a python package:

To install Presidio Anonymizer, run the following, preferably in a virtual environment:

```sh
pip install presidio-anonymizer
```

#### Getting started

```python
from presidio_anonymizer import AnonymizerEngine
from presidio_anonymizer.entities import RecognizerResult, OperatorConfig

# Initialize the engine with logger.
engine = AnonymizerEngine()

# Invoke the anonymize function with the text, 
# analyzer results (potentially coming from presidio-analyzer) and
# Operators to get the anonymization output:
result = engine.anonymize(
    text="My name is Bond, James Bond",
    analyzer_results=[
        RecognizerResult(entity_type="PERSON", start=11, end=15, score=0.8),
        RecognizerResult(entity_type="PERSON", start=17, end=27, score=0.8),
    ],
    operators={"PERSON": OperatorConfig("replace", {"new_value": "BIP"})},
)

print(result)
```
This example take the output of the AnonymizerEngine with encrypted PII entities, 
and decrypt it back to the original text:
```python
from presidio_anonymizer import DeanonymizeEngine
from presidio_anonymizer.entities import OperatorResult, OperatorConfig

# Initialize the engine with logger.
engine = DeanonymizeEngine()

# Invoke the deanonymize function with the text, anonymizer results and
# Operators to define the deanonymization type.
result = engine.deanonymize(
    text="My name is S184CMt9Drj7QaKQ21JTrpYzghnboTF9pn/neN8JME0=",
    entities=[
        OperatorResult(start=11, end=55, entity_type="PERSON"),
    ],
    operators={"DEFAULT": OperatorConfig("decrypt", {"key": "WmZq4t7w!z%C&F)J"})},
)

print(result)

```

### As docker service:

In folder presidio/presidio-anonymizer run:

```
docker-compose up -d
```

### HTTP API

Follow the [API Spec](https://microsoft.github.io/presidio/api-docs/api-docs.html#tag/Anonymizer) for the
Anonymizer REST API reference details

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/microsoft/presidio",
    "name": "presidio-anonymizer",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.5",
    "maintainer_email": "",
    "keywords": "presidio_anonymizer",
    "author": "",
    "author_email": "",
    "download_url": "",
    "platform": null,
    "description": "# Presidio anonymizer\n\n## Description\n\nThe Presidio anonymizer is a Python based module for anonymizing detected PII text\nentities with desired values.\n\n![Anonymizer Design](../docs/assets/anonymizer-design.png)\n\n### Deploy Presidio anonymizer to Azure\n\nUse the following button to deploy presidio anonymizer to your Azure subscription.\n \n[![Deploy to Azure](https://aka.ms/deploytoazurebutton)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2Fmicrosoft%2Fpresidio%2Fmain%2Fpresidio-anonymizer%2Fdeploytoazure.json)\n\n\nThe Presidio-Anonymizer package contains both Anonymizers and Deanonymizers.\n- *Anonymizers* are used to replace a PII entity text with some other value.\n- *Deanonymizers* are used to revert the anonymization operation. \n  For example, to decrypt an encrypted text.\n\n### Anonymizer\n\nPresidio anonymizer comes by default with the following anonymizers:\n\n-   **Replace**: Replaces the PII with desired value.\n    -   Parameters: `new_value` - replaces existing text with the given value.\n        If `new_value` is not supplied or empty, default behavior will be: <entity_type>\n        e.g: <PHONE_NUMBER>\n\n-   **Redact**: Removes the PII completely from text.\n    -   Parameters: None\n-   **Hash**: Hashes the PII using either sha256, sha512 or md5. \n    -   Parameters:\n        - `hash_type`: Sets the type of hashing. \n          Can be either `sha256`, `sha512` or `md5`.\n          The default hash type is `sha256`.\n-   **Mask**: Replaces the PII with a sequence of a given character.\n    -   Parameters:\n\n        -   `chars_to_mask`: The amount of characters out of the PII that should be\n            replaced.\n        -   `masking_char`: The character to be replaced with.\n        -   `from_end`: Whether to mask the PII from it's end.\n    \n-   **Encrypt**: Encrypt the PII entity text and replace the original with the encrypted string. \n-   **Custom**: Replace the PII with the result of the function executed on the PII string.\n    - Parameters: `lambda`: Lambda function to execute on the PII string.\n    The lambda return type must be a string.\n\n\nThe **Anonymizer** default setting is to use the Advanced Encryption Standard (AES) as the encryption algorithm, also known as Rijndael. \n   \n-  Parameters:\n    - `key`: A cryptographic key used for the encryption. \n      The length of the key needs to be of 128, 192 or 256 bits, in a string format.\n\nNote: If the default anonymizer is not provided, \nthe default anonymizer is \"replace\" for all entities. \nThe replacing value will be the entity type e.g.: <PHONE_NUMBER>\n\n#### Handling overlaps between entities\n\nAs the input text could potentially have overlapping PII entities, there are different\nanonymization scenarios:\n\n-   **No overlap (single PII)**: When there is no overlap in spans of entities, \n    Presidio Anonymizer uses a given or default anonymization operator to anonymize \n    and replace the PII text entity.\n-   **Full overlap of PII entity spans**: When entities have overlapping substrings,  \n    the PII with the higher score will be taken. \n    Between PIIs with identical scores, the selection is arbitrary.\n-   **One PII is contained in another**: Presidio Anonymizer will use the PII with the larger text even if it's score is lower.\n-   **Partial intersection**: Presidio Anonymizer will anonymize each individually and will return a concatenation of the anonymized text. \n    For example: \n    For the text\n    ```\n    I'm George Washington Square Park.\n    ``` \n    Assuming one entity is `George Washington` and the other is `Washington State Park` \n    and assuming the default anonymizer, the result would be \n    ```\n    I'm <PERSON><LOCATION>.\n    ```\n\nAdditional examples for overlapping PII scenarios:\n\nText:\n```\nMy name is Inigo Montoya. You Killed my Father. Prepare to die. BTW my number is:\n03-232323.\n```\n\n-   No overlaps: Assuming only `Inigo` is recognized as NAME:\n    ```\n    My name is <NAME> Montoya. You Killed my Father. Prepare to die. BTW my number is:\n    03-232323.\n    ```\n-   Full overlap: Assuming the number is recognized as PHONE_NUMBER with score of 0.7 and as SSN\n    with score of 0.6, the higher score would count:\n    ```\n    My name is Inigo Montoya. You Killed my Father. Prepare to die. BTW my number is: <\n    PHONE_NUMBER>.\n    ```\n-   One PII is contained is another: Assuming Inigo is recognized as FIRST_NAME and Inigo Montoya\n    was recognized as NAME, the larger one will be used:\n    ```\n    My name is <NAME>. You Killed my Father. Prepare to die. BTW my number is: 03-232323.\n    ```\n-   Partial intersection: Assuming the number 03-2323 is recognized as a PHONE_NUMBER but 232323\n    is recognized as SSN:\n    ```\n    My name is Inigo Montoya. You Killed my Father. Prepare to die. BTW my number is: <\n    PHONE_NUMBER><SSN>.\n    ```\n\n### Deanonymizer\n\nPresidio deanonymizer currently contains one operator:\n\n-   **Decrypt**: Replace the encrypted text with decrypted text. \n    Uses Advanced Encryption Standard (AES) as the encryption algorithm, also known as Rijndael. \n    -  Parameters:\n        -   `key` - a cryptographic key used for the encryption. \n            The length of the key needs to be of 128, 192 or 256 bits, in a string format.\n\nPlease notice: you can use \"DEFAULT\" as an operator key to define an operator over all entities.\n\n## Installation\n\n### As a python package:\n\nTo install Presidio Anonymizer, run the following, preferably in a virtual environment:\n\n```sh\npip install presidio-anonymizer\n```\n\n#### Getting started\n\n```python\nfrom presidio_anonymizer import AnonymizerEngine\nfrom presidio_anonymizer.entities import RecognizerResult, OperatorConfig\n\n# Initialize the engine with logger.\nengine = AnonymizerEngine()\n\n# Invoke the anonymize function with the text, \n# analyzer results (potentially coming from presidio-analyzer) and\n# Operators to get the anonymization output:\nresult = engine.anonymize(\n    text=\"My name is Bond, James Bond\",\n    analyzer_results=[\n        RecognizerResult(entity_type=\"PERSON\", start=11, end=15, score=0.8),\n        RecognizerResult(entity_type=\"PERSON\", start=17, end=27, score=0.8),\n    ],\n    operators={\"PERSON\": OperatorConfig(\"replace\", {\"new_value\": \"BIP\"})},\n)\n\nprint(result)\n```\nThis example take the output of the AnonymizerEngine with encrypted PII entities, \nand decrypt it back to the original text:\n```python\nfrom presidio_anonymizer import DeanonymizeEngine\nfrom presidio_anonymizer.entities import OperatorResult, OperatorConfig\n\n# Initialize the engine with logger.\nengine = DeanonymizeEngine()\n\n# Invoke the deanonymize function with the text, anonymizer results and\n# Operators to define the deanonymization type.\nresult = engine.deanonymize(\n    text=\"My name is S184CMt9Drj7QaKQ21JTrpYzghnboTF9pn/neN8JME0=\",\n    entities=[\n        OperatorResult(start=11, end=55, entity_type=\"PERSON\"),\n    ],\n    operators={\"DEFAULT\": OperatorConfig(\"decrypt\", {\"key\": \"WmZq4t7w!z%C&F)J\"})},\n)\n\nprint(result)\n\n```\n\n### As docker service:\n\nIn folder presidio/presidio-anonymizer run:\n\n```\ndocker-compose up -d\n```\n\n### HTTP API\n\nFollow the [API Spec](https://microsoft.github.io/presidio/api-docs/api-docs.html#tag/Anonymizer) for the\nAnonymizer REST API reference details\n",
    "bugtrack_url": null,
    "license": "MIT license",
    "summary": "Persidio Anonymizer package - replaces analyzed text with desired values.",
    "version": "2.2.353",
    "project_urls": {
        "Homepage": "https://github.com/microsoft/presidio"
    },
    "split_keywords": [
        "presidio_anonymizer"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c102c7bd13b4fa1061c79670b4a333acf5071cb6a38b64a7a66bbfccab0d4392",
                "md5": "a8fc7519d0cc4faf5e415bd799eaada1",
                "sha256": "5fc1a3ed00da0dfa468f80f6265a8550ff798d56daba32412ad1b39d67bd85cc"
            },
            "downloads": -1,
            "filename": "presidio_anonymizer-2.2.353-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a8fc7519d0cc4faf5e415bd799eaada1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.5",
            "size": 31388,
            "upload_time": "2024-02-12T15:44:39",
            "upload_time_iso_8601": "2024-02-12T15:44:39.928132Z",
            "url": "https://files.pythonhosted.org/packages/c1/02/c7bd13b4fa1061c79670b4a333acf5071cb6a38b64a7a66bbfccab0d4392/presidio_anonymizer-2.2.353-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-12 15:44:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "microsoft",
    "github_project": "presidio",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "presidio-anonymizer"
}
        
Elapsed time: 0.21094s