Name | presidio-structured JSON |
Version |
0.0.4a0
JSON |
| download |
home_page | None |
Summary | Presidio structured package - analyzes and anonymizes structured and semi-structured data. |
upload_time | 2025-01-13 13:02:05 |
maintainer | None |
docs_url | None |
author | Presidio |
requires_python | <4.0,>=3.9 |
license | MIT |
keywords |
presidio_structured
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Presidio structured
## Status
**Alpha**: This package is currently in alpha, meaning it is in its early stages of development. Features and functionality may change as the project evolves.
## Description
The Presidio structured package is a flexible and customizable framework designed to identify and protect structured sensitive data. This tool extends the capabilities of Presidio, focusing on structured data formats such as tabular formats and semi-structured formats (JSON). It leverages the detection capabilities of Presidio-Analyzer to identify columns or keys containing personally identifiable information (PII), and establishes a mapping between these column/keys names and the detected PII entities. Following the detection, Presidio-Anonymizer is used to apply de-identification techniques to each value in columns identified as containing PII, ensuring the sensitive data is appropriately protected.
## Installation
### As a python package
To install the `presidio-structured` package, run the following command:
```sh
pip install presidio-structured
```
### Getting started
Anonymizing Data Frames:
```py
import pandas as pd
from presidio_structured import StructuredEngine, PandasAnalysisBuilder
from presidio_anonymizer.entities import OperatorConfig
from faker import Faker # optionally using faker as an example
# Initialize the engine with a Pandas data processor (default)
pandas_engine = StructuredEngine()
# Create a sample DataFrame
sample_df = pd.DataFrame({'name': ['John Doe', 'Jane Smith'], 'email': ['john.doe@example.com', 'jane.smith@example.com']})
# Generate a tabular analysis which describes PII entities in the DataFrame.
tabular_analysis = PandasAnalysisBuilder().generate_analysis(sample_df)
# Define anonymization operators
fake = Faker()
operators = {
"PERSON": OperatorConfig("replace", {"new_value": "REDACTED"}),
"EMAIL_ADDRESS": OperatorConfig("custom", {"lambda": lambda x: fake.safe_email()})
}
# Anonymize DataFrame
anonymized_df = pandas_engine.anonymize(sample_df, tabular_analysis, operators=operators)
print(anonymized_df)
```
## More information
- [Docs](https://microsoft.github.io/presidio/structured/)
- [Samples](https://github.com/microsoft/presidio/blob/main/docs/samples/python/example_structured.ipynb)
- [Join the discussion](https://github.com/microsoft/presidio/discussions?discussions_q=structured)
- [Review issues on Github](https://github.com/microsoft/presidio/issues?q=is%3Aissue+label%3Astructured-data)
Raw data
{
"_id": null,
"home_page": null,
"name": "presidio-structured",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "presidio_structured",
"author": "Presidio",
"author_email": "presidio@microsoft.com",
"download_url": null,
"platform": null,
"description": "# Presidio structured\n\n## Status\n\n**Alpha**: This package is currently in alpha, meaning it is in its early stages of development. Features and functionality may change as the project evolves.\n\n## Description\n\nThe Presidio structured package is a flexible and customizable framework designed to identify and protect structured sensitive data. This tool extends the capabilities of Presidio, focusing on structured data formats such as tabular formats and semi-structured formats (JSON). It leverages the detection capabilities of Presidio-Analyzer to identify columns or keys containing personally identifiable information (PII), and establishes a mapping between these column/keys names and the detected PII entities. Following the detection, Presidio-Anonymizer is used to apply de-identification techniques to each value in columns identified as containing PII, ensuring the sensitive data is appropriately protected.\n\n## Installation\n\n### As a python package\n\nTo install the `presidio-structured` package, run the following command:\n\n```sh\npip install presidio-structured\n```\n\n### Getting started\n\nAnonymizing Data Frames:\n\n```py\nimport pandas as pd\nfrom presidio_structured import StructuredEngine, PandasAnalysisBuilder\nfrom presidio_anonymizer.entities import OperatorConfig\nfrom faker import Faker # optionally using faker as an example\n\n# Initialize the engine with a Pandas data processor (default)\npandas_engine = StructuredEngine()\n\n# Create a sample DataFrame\nsample_df = pd.DataFrame({'name': ['John Doe', 'Jane Smith'], 'email': ['john.doe@example.com', 'jane.smith@example.com']})\n\n# Generate a tabular analysis which describes PII entities in the DataFrame.\ntabular_analysis = PandasAnalysisBuilder().generate_analysis(sample_df)\n\n# Define anonymization operators\nfake = Faker()\noperators = {\n \"PERSON\": OperatorConfig(\"replace\", {\"new_value\": \"REDACTED\"}),\n \"EMAIL_ADDRESS\": OperatorConfig(\"custom\", {\"lambda\": lambda x: fake.safe_email()})\n}\n\n# Anonymize DataFrame\nanonymized_df = pandas_engine.anonymize(sample_df, tabular_analysis, operators=operators)\nprint(anonymized_df)\n```\n\n## More information\n\n- [Docs](https://microsoft.github.io/presidio/structured/)\n- [Samples](https://github.com/microsoft/presidio/blob/main/docs/samples/python/example_structured.ipynb)\n- [Join the discussion](https://github.com/microsoft/presidio/discussions?discussions_q=structured)\n- [Review issues on Github](https://github.com/microsoft/presidio/issues?q=is%3Aissue+label%3Astructured-data)\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Presidio structured package - analyzes and anonymizes structured and semi-structured data.",
"version": "0.0.4a0",
"project_urls": {
"Homepage": "https://github.com/microsoft/presidio"
},
"split_keywords": [
"presidio_structured"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3060daa63e80c9162d0d225627314375042052485433fcd79f1e28fceaaf8704",
"md5": "3504dfb7062d83ff61b04773d238451e",
"sha256": "7cc63b48038a177684cb9512d481571814c04331a0f4ddeb09299cc76803258b"
},
"downloads": -1,
"filename": "presidio_structured-0.0.4a0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3504dfb7062d83ff61b04773d238451e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 11350,
"upload_time": "2025-01-13T13:02:05",
"upload_time_iso_8601": "2025-01-13T13:02:05.875587Z",
"url": "https://files.pythonhosted.org/packages/30/60/daa63e80c9162d0d225627314375042052485433fcd79f1e28fceaaf8704/presidio_structured-0.0.4a0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-13 13:02:05",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "microsoft",
"github_project": "presidio",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "presidio-structured"
}