Name | presidio-anonymizer JSON |
Version |
2.2.357
JSON |
| download |
home_page | None |
Summary | Presidio Anonymizer package - replaces analyzed text with desired values. |
upload_time | 2025-01-13 13:01:43 |
maintainer | None |
docs_url | None |
author | Presidio |
requires_python | <4.0,>=3.9 |
license | MIT |
keywords |
presidio_anonymizer
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Presidio anonymizer
## Description
The Presidio anonymizer is a Python based module for anonymizing detected PII text
entities with desired values.

### Deploy Presidio anonymizer to Azure
Use the following button to deploy presidio anonymizer to your Azure subscription.
[](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2Fmicrosoft%2Fpresidio%2Fmain%2Fpresidio-anonymizer%2Fdeploytoazure.json)
The Presidio-Anonymizer package contains both Anonymizers and Deanonymizers.
- *Anonymizers* are used to replace a PII entity text with some other value.
- *Deanonymizers* are used to revert the anonymization operation.
For example, to decrypt an encrypted text.
### Anonymizer
Presidio anonymizer comes by default with the following anonymizers:
- **Replace**: Replaces the PII with desired value.
- Parameters: `new_value` - replaces existing text with the given value.
If `new_value` is not supplied or empty, default behavior will be: <entity_type>
e.g: <PHONE_NUMBER>
- **Redact**: Removes the PII completely from text.
- Parameters: None
- **Hash**: Hashes the PII using either sha256, sha512 or md5.
- Parameters:
- `hash_type`: Sets the type of hashing.
Can be either `sha256`, `sha512` or `md5`.
The default hash type is `sha256`.
- **Mask**: Replaces the PII with a sequence of a given character.
- Parameters:
- `chars_to_mask`: The amount of characters out of the PII that should be
replaced.
- `masking_char`: The character to be replaced with.
- `from_end`: Whether to mask the PII from it's end.
- **Encrypt**: Encrypt the PII entity text and replace the original with the encrypted string.
- **Custom**: Replace the PII with the result of the function executed on the PII string.
- Parameters: `lambda`: Lambda function to execute on the PII string.
The lambda return type must be a string.
The **Anonymizer** default setting is to use the Advanced Encryption Standard (AES) as the encryption algorithm, also known as Rijndael.
- Parameters:
- `key`: A cryptographic key used for the encryption.
The length of the key needs to be of 128, 192 or 256 bits, in a string format.
Note: If the default anonymizer is not provided,
the default anonymizer is "replace" for all entities.
The replacing value will be the entity type e.g.: <PHONE_NUMBER>
#### Handling overlaps between entities
As the input text could potentially have overlapping PII entities, there are different
anonymization scenarios:
- **No overlap (single PII)**: When there is no overlap in spans of entities,
Presidio Anonymizer uses a given or default anonymization operator to anonymize
and replace the PII text entity.
- **Full overlap of PII entity spans**: When entities have overlapping substrings,
the PII with the higher score will be taken.
Between PIIs with identical scores, the selection is arbitrary.
- **One PII is contained in another**: Presidio Anonymizer will use the PII with the larger text even if it's score is lower.
- **Partial intersection**: Presidio Anonymizer will anonymize each individually and will return a concatenation of the anonymized text.
For example:
For the text
```
I'm George Washington Square Park.
```
Assuming one entity is `George Washington` and the other is `Washington State Park`
and assuming the default anonymizer, the result would be
```
I'm <PERSON><LOCATION>.
```
Additional examples for overlapping PII scenarios:
Text:
```
My name is Inigo Montoya. You Killed my Father. Prepare to die. BTW my number is:
03-232323.
```
- No overlaps: Assuming only `Inigo` is recognized as NAME:
```
My name is <NAME> Montoya. You Killed my Father. Prepare to die. BTW my number is:
03-232323.
```
- Full overlap: Assuming the number is recognized as PHONE_NUMBER with score of 0.7 and as SSN
with score of 0.6, the higher score would count:
```
My name is Inigo Montoya. You Killed my Father. Prepare to die. BTW my number is: <
PHONE_NUMBER>.
```
- One PII is contained is another: Assuming Inigo is recognized as FIRST_NAME and Inigo Montoya
was recognized as NAME, the larger one will be used:
```
My name is <NAME>. You Killed my Father. Prepare to die. BTW my number is: 03-232323.
```
- Partial intersection: Assuming the number 03-2323 is recognized as a PHONE_NUMBER but 232323
is recognized as SSN:
```
My name is Inigo Montoya. You Killed my Father. Prepare to die. BTW my number is: <
PHONE_NUMBER><SSN>.
```
### Deanonymizer
Presidio deanonymizer currently contains one operator:
- **Decrypt**: Replace the encrypted text with decrypted text.
Uses Advanced Encryption Standard (AES) as the encryption algorithm, also known as Rijndael.
- Parameters:
- `key` - a cryptographic key used for the encryption.
The length of the key needs to be of 128, 192 or 256 bits, in a string format.
Please notice: you can use "DEFAULT" as an operator key to define an operator over all entities.
## Installation
### As a python package:
To install Presidio Anonymizer, run the following, preferably in a virtual environment:
```sh
pip install presidio-anonymizer
```
#### Getting started
```python
from presidio_anonymizer import AnonymizerEngine
from presidio_anonymizer.entities import RecognizerResult, OperatorConfig
# Initialize the engine with logger.
engine = AnonymizerEngine()
# Invoke the anonymize function with the text,
# analyzer results (potentially coming from presidio-analyzer) and
# Operators to get the anonymization output:
result = engine.anonymize(
text="My name is Bond, James Bond",
analyzer_results=[
RecognizerResult(entity_type="PERSON", start=11, end=15, score=0.8),
RecognizerResult(entity_type="PERSON", start=17, end=27, score=0.8),
],
operators={"PERSON": OperatorConfig("replace", {"new_value": "BIP"})},
)
print(result)
```
This example take the output of the AnonymizerEngine with encrypted PII entities,
and decrypt it back to the original text:
```python
from presidio_anonymizer import DeanonymizeEngine
from presidio_anonymizer.entities import OperatorResult, OperatorConfig
# Initialize the engine with logger.
engine = DeanonymizeEngine()
# Invoke the deanonymize function with the text, anonymizer results and
# Operators to define the deanonymization type.
result = engine.deanonymize(
text="My name is S184CMt9Drj7QaKQ21JTrpYzghnboTF9pn/neN8JME0=",
entities=[
OperatorResult(start=11, end=55, entity_type="PERSON"),
],
operators={"DEFAULT": OperatorConfig("decrypt", {"key": "WmZq4t7w!z%C&F)J"})},
)
print(result)
```
### As docker service:
In folder presidio/presidio-anonymizer run:
```
docker-compose up -d
```
### HTTP API
Follow the [API Spec](https://microsoft.github.io/presidio/api-docs/api-docs.html#tag/Anonymizer) for the
Anonymizer REST API reference details
Raw data
{
"_id": null,
"home_page": null,
"name": "presidio-anonymizer",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "presidio_anonymizer",
"author": "Presidio",
"author_email": "presidio@microsoft.com",
"download_url": null,
"platform": null,
"description": "# Presidio anonymizer\n\n## Description\n\nThe Presidio anonymizer is a Python based module for anonymizing detected PII text\nentities with desired values.\n\n\n\n### Deploy Presidio anonymizer to Azure\n\nUse the following button to deploy presidio anonymizer to your Azure subscription.\n \n[](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2Fmicrosoft%2Fpresidio%2Fmain%2Fpresidio-anonymizer%2Fdeploytoazure.json)\n\n\nThe Presidio-Anonymizer package contains both Anonymizers and Deanonymizers.\n- *Anonymizers* are used to replace a PII entity text with some other value.\n- *Deanonymizers* are used to revert the anonymization operation. \n For example, to decrypt an encrypted text.\n\n### Anonymizer\n\nPresidio anonymizer comes by default with the following anonymizers:\n\n- **Replace**: Replaces the PII with desired value.\n - Parameters: `new_value` - replaces existing text with the given value.\n If `new_value` is not supplied or empty, default behavior will be: <entity_type>\n e.g: <PHONE_NUMBER>\n\n- **Redact**: Removes the PII completely from text.\n - Parameters: None\n- **Hash**: Hashes the PII using either sha256, sha512 or md5. \n - Parameters:\n - `hash_type`: Sets the type of hashing. \n Can be either `sha256`, `sha512` or `md5`.\n The default hash type is `sha256`.\n- **Mask**: Replaces the PII with a sequence of a given character.\n - Parameters:\n\n - `chars_to_mask`: The amount of characters out of the PII that should be\n replaced.\n - `masking_char`: The character to be replaced with.\n - `from_end`: Whether to mask the PII from it's end.\n \n- **Encrypt**: Encrypt the PII entity text and replace the original with the encrypted string. \n- **Custom**: Replace the PII with the result of the function executed on the PII string.\n - Parameters: `lambda`: Lambda function to execute on the PII string.\n The lambda return type must be a string.\n\n\nThe **Anonymizer** default setting is to use the Advanced Encryption Standard (AES) as the encryption algorithm, also known as Rijndael. \n \n- Parameters:\n - `key`: A cryptographic key used for the encryption. \n The length of the key needs to be of 128, 192 or 256 bits, in a string format.\n\nNote: If the default anonymizer is not provided, \nthe default anonymizer is \"replace\" for all entities. \nThe replacing value will be the entity type e.g.: <PHONE_NUMBER>\n\n#### Handling overlaps between entities\n\nAs the input text could potentially have overlapping PII entities, there are different\nanonymization scenarios:\n\n- **No overlap (single PII)**: When there is no overlap in spans of entities, \n Presidio Anonymizer uses a given or default anonymization operator to anonymize \n and replace the PII text entity.\n- **Full overlap of PII entity spans**: When entities have overlapping substrings, \n the PII with the higher score will be taken. \n Between PIIs with identical scores, the selection is arbitrary.\n- **One PII is contained in another**: Presidio Anonymizer will use the PII with the larger text even if it's score is lower.\n- **Partial intersection**: Presidio Anonymizer will anonymize each individually and will return a concatenation of the anonymized text. \n For example: \n For the text\n ```\n I'm George Washington Square Park.\n ``` \n Assuming one entity is `George Washington` and the other is `Washington State Park` \n and assuming the default anonymizer, the result would be \n ```\n I'm <PERSON><LOCATION>.\n ```\n\nAdditional examples for overlapping PII scenarios:\n\nText:\n```\nMy name is Inigo Montoya. You Killed my Father. Prepare to die. BTW my number is:\n03-232323.\n```\n\n- No overlaps: Assuming only `Inigo` is recognized as NAME:\n ```\n My name is <NAME> Montoya. You Killed my Father. Prepare to die. BTW my number is:\n 03-232323.\n ```\n- Full overlap: Assuming the number is recognized as PHONE_NUMBER with score of 0.7 and as SSN\n with score of 0.6, the higher score would count:\n ```\n My name is Inigo Montoya. You Killed my Father. Prepare to die. BTW my number is: <\n PHONE_NUMBER>.\n ```\n- One PII is contained is another: Assuming Inigo is recognized as FIRST_NAME and Inigo Montoya\n was recognized as NAME, the larger one will be used:\n ```\n My name is <NAME>. You Killed my Father. Prepare to die. BTW my number is: 03-232323.\n ```\n- Partial intersection: Assuming the number 03-2323 is recognized as a PHONE_NUMBER but 232323\n is recognized as SSN:\n ```\n My name is Inigo Montoya. You Killed my Father. Prepare to die. BTW my number is: <\n PHONE_NUMBER><SSN>.\n ```\n\n### Deanonymizer\n\nPresidio deanonymizer currently contains one operator:\n\n- **Decrypt**: Replace the encrypted text with decrypted text. \n Uses Advanced Encryption Standard (AES) as the encryption algorithm, also known as Rijndael. \n - Parameters:\n - `key` - a cryptographic key used for the encryption. \n The length of the key needs to be of 128, 192 or 256 bits, in a string format.\n\nPlease notice: you can use \"DEFAULT\" as an operator key to define an operator over all entities.\n\n## Installation\n\n### As a python package:\n\nTo install Presidio Anonymizer, run the following, preferably in a virtual environment:\n\n```sh\npip install presidio-anonymizer\n```\n\n#### Getting started\n\n```python\nfrom presidio_anonymizer import AnonymizerEngine\nfrom presidio_anonymizer.entities import RecognizerResult, OperatorConfig\n\n# Initialize the engine with logger.\nengine = AnonymizerEngine()\n\n# Invoke the anonymize function with the text, \n# analyzer results (potentially coming from presidio-analyzer) and\n# Operators to get the anonymization output:\nresult = engine.anonymize(\n text=\"My name is Bond, James Bond\",\n analyzer_results=[\n RecognizerResult(entity_type=\"PERSON\", start=11, end=15, score=0.8),\n RecognizerResult(entity_type=\"PERSON\", start=17, end=27, score=0.8),\n ],\n operators={\"PERSON\": OperatorConfig(\"replace\", {\"new_value\": \"BIP\"})},\n)\n\nprint(result)\n```\nThis example take the output of the AnonymizerEngine with encrypted PII entities, \nand decrypt it back to the original text:\n```python\nfrom presidio_anonymizer import DeanonymizeEngine\nfrom presidio_anonymizer.entities import OperatorResult, OperatorConfig\n\n# Initialize the engine with logger.\nengine = DeanonymizeEngine()\n\n# Invoke the deanonymize function with the text, anonymizer results and\n# Operators to define the deanonymization type.\nresult = engine.deanonymize(\n text=\"My name is S184CMt9Drj7QaKQ21JTrpYzghnboTF9pn/neN8JME0=\",\n entities=[\n OperatorResult(start=11, end=55, entity_type=\"PERSON\"),\n ],\n operators={\"DEFAULT\": OperatorConfig(\"decrypt\", {\"key\": \"WmZq4t7w!z%C&F)J\"})},\n)\n\nprint(result)\n\n```\n\n### As docker service:\n\nIn folder presidio/presidio-anonymizer run:\n\n```\ndocker-compose up -d\n```\n\n### HTTP API\n\nFollow the [API Spec](https://microsoft.github.io/presidio/api-docs/api-docs.html#tag/Anonymizer) for the\nAnonymizer REST API reference details\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Presidio Anonymizer package - replaces analyzed text with desired values.",
"version": "2.2.357",
"project_urls": {
"Homepage": "https://github.com/Microsoft/presidio"
},
"split_keywords": [
"presidio_anonymizer"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "865a74d4f11e7b111c7570235c2ce4dac923f30c7c85fcc5d8523c8c146b9cdf",
"md5": "a69ebf9e9a2ffa1639665495f539bd2f",
"sha256": "0b3e5e0526f5950bb9b27941e5b1b01b6761295d178a8ba4cedd2771aa2aee52"
},
"downloads": -1,
"filename": "presidio_anonymizer-2.2.357-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a69ebf9e9a2ffa1639665495f539bd2f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 31209,
"upload_time": "2025-01-13T13:01:43",
"upload_time_iso_8601": "2025-01-13T13:01:43.477778Z",
"url": "https://files.pythonhosted.org/packages/86/5a/74d4f11e7b111c7570235c2ce4dac923f30c7c85fcc5d8523c8c146b9cdf/presidio_anonymizer-2.2.357-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-13 13:01:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Microsoft",
"github_project": "presidio",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "presidio-anonymizer"
}