presidio-structured


Namepresidio-structured JSON
Version 0.0.4a0 PyPI version JSON
download
home_pageNone
SummaryPresidio structured package - analyzes and anonymizes structured and semi-structured data.
upload_time2025-01-13 13:02:05
maintainerNone
docs_urlNone
authorPresidio
requires_python<4.0,>=3.9
licenseMIT
keywords presidio_structured
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Presidio structured

## Status

**Alpha**: This package is currently in alpha, meaning it is in its early stages of development. Features and functionality may change as the project evolves.

## Description

The Presidio structured package is a flexible and customizable framework designed to identify and protect structured sensitive data. This tool extends the capabilities of Presidio, focusing on structured data formats such as tabular formats and semi-structured formats (JSON). It leverages the detection capabilities of Presidio-Analyzer to identify columns or keys containing personally identifiable information (PII), and establishes a mapping between these column/keys names and the detected PII entities. Following the detection, Presidio-Anonymizer is used to apply de-identification techniques to each value in columns identified as containing PII, ensuring the sensitive data is appropriately protected.

## Installation

### As a python package

To install the `presidio-structured` package, run the following command:

```sh
pip install presidio-structured
```

### Getting started

Anonymizing Data Frames:

```py
import pandas as pd
from presidio_structured import StructuredEngine, PandasAnalysisBuilder
from presidio_anonymizer.entities import OperatorConfig
from faker import Faker # optionally using faker as an example

# Initialize the engine with a Pandas data processor (default)
pandas_engine = StructuredEngine()

# Create a sample DataFrame
sample_df = pd.DataFrame({'name': ['John Doe', 'Jane Smith'], 'email': ['john.doe@example.com', 'jane.smith@example.com']})

# Generate a tabular analysis which describes PII entities in the DataFrame.
tabular_analysis = PandasAnalysisBuilder().generate_analysis(sample_df)

# Define anonymization operators
fake = Faker()
operators = {
    "PERSON": OperatorConfig("replace", {"new_value": "REDACTED"}),
    "EMAIL_ADDRESS": OperatorConfig("custom", {"lambda": lambda x: fake.safe_email()})
}

# Anonymize DataFrame
anonymized_df = pandas_engine.anonymize(sample_df, tabular_analysis, operators=operators)
print(anonymized_df)
```

## More information

- [Docs](https://microsoft.github.io/presidio/structured/)
- [Samples](https://github.com/microsoft/presidio/blob/main/docs/samples/python/example_structured.ipynb)
- [Join the discussion](https://github.com/microsoft/presidio/discussions?discussions_q=structured)
- [Review issues on Github](https://github.com/microsoft/presidio/issues?q=is%3Aissue+label%3Astructured-data)


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "presidio-structured",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": "presidio_structured",
    "author": "Presidio",
    "author_email": "presidio@microsoft.com",
    "download_url": null,
    "platform": null,
    "description": "# Presidio structured\n\n## Status\n\n**Alpha**: This package is currently in alpha, meaning it is in its early stages of development. Features and functionality may change as the project evolves.\n\n## Description\n\nThe Presidio structured package is a flexible and customizable framework designed to identify and protect structured sensitive data. This tool extends the capabilities of Presidio, focusing on structured data formats such as tabular formats and semi-structured formats (JSON). It leverages the detection capabilities of Presidio-Analyzer to identify columns or keys containing personally identifiable information (PII), and establishes a mapping between these column/keys names and the detected PII entities. Following the detection, Presidio-Anonymizer is used to apply de-identification techniques to each value in columns identified as containing PII, ensuring the sensitive data is appropriately protected.\n\n## Installation\n\n### As a python package\n\nTo install the `presidio-structured` package, run the following command:\n\n```sh\npip install presidio-structured\n```\n\n### Getting started\n\nAnonymizing Data Frames:\n\n```py\nimport pandas as pd\nfrom presidio_structured import StructuredEngine, PandasAnalysisBuilder\nfrom presidio_anonymizer.entities import OperatorConfig\nfrom faker import Faker # optionally using faker as an example\n\n# Initialize the engine with a Pandas data processor (default)\npandas_engine = StructuredEngine()\n\n# Create a sample DataFrame\nsample_df = pd.DataFrame({'name': ['John Doe', 'Jane Smith'], 'email': ['john.doe@example.com', 'jane.smith@example.com']})\n\n# Generate a tabular analysis which describes PII entities in the DataFrame.\ntabular_analysis = PandasAnalysisBuilder().generate_analysis(sample_df)\n\n# Define anonymization operators\nfake = Faker()\noperators = {\n    \"PERSON\": OperatorConfig(\"replace\", {\"new_value\": \"REDACTED\"}),\n    \"EMAIL_ADDRESS\": OperatorConfig(\"custom\", {\"lambda\": lambda x: fake.safe_email()})\n}\n\n# Anonymize DataFrame\nanonymized_df = pandas_engine.anonymize(sample_df, tabular_analysis, operators=operators)\nprint(anonymized_df)\n```\n\n## More information\n\n- [Docs](https://microsoft.github.io/presidio/structured/)\n- [Samples](https://github.com/microsoft/presidio/blob/main/docs/samples/python/example_structured.ipynb)\n- [Join the discussion](https://github.com/microsoft/presidio/discussions?discussions_q=structured)\n- [Review issues on Github](https://github.com/microsoft/presidio/issues?q=is%3Aissue+label%3Astructured-data)\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Presidio structured package - analyzes and anonymizes structured and semi-structured data.",
    "version": "0.0.4a0",
    "project_urls": {
        "Homepage": "https://github.com/microsoft/presidio"
    },
    "split_keywords": [
        "presidio_structured"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3060daa63e80c9162d0d225627314375042052485433fcd79f1e28fceaaf8704",
                "md5": "3504dfb7062d83ff61b04773d238451e",
                "sha256": "7cc63b48038a177684cb9512d481571814c04331a0f4ddeb09299cc76803258b"
            },
            "downloads": -1,
            "filename": "presidio_structured-0.0.4a0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3504dfb7062d83ff61b04773d238451e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 11350,
            "upload_time": "2025-01-13T13:02:05",
            "upload_time_iso_8601": "2025-01-13T13:02:05.875587Z",
            "url": "https://files.pythonhosted.org/packages/30/60/daa63e80c9162d0d225627314375042052485433fcd79f1e28fceaaf8704/presidio_structured-0.0.4a0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-13 13:02:05",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "microsoft",
    "github_project": "presidio",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "presidio-structured"
}
        
Elapsed time: 0.46720s