detectpii


Namedetectpii JSON
Version 0.1.3 PyPI version JSON
download
home_pagehttps://github.com/thescalaguy/detectpii
SummaryDetect PII columns in your database and warehouse
upload_time2024-08-09 06:52:14
maintainerNone
docs_urlNone
authorFasih Khatib
requires_python<4.0,>=3.11
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # 🔍 Detect PII

Detect PII is a library inspired by [piicatcher](https://github.com/tokern/piicatcher) and [CommonRegex](https://github.com/madisonmay/CommonRegex) to detect columns in tables that may potentially contain PII. It does so by performing regex matches 
on column names and column values, flagging the ones that may contain PII.

## Usage

### Installation

```shell
pip install detectpii
```

### Scan tables for PII

```python
from detectpii.catalog import PostgresCatalog
from detectpii.pipeline import PiiDetectionPipeline
from detectpii.scanner import DataScanner, MetadataScanner
from detectpii.util import print_columns

# -- Create a catalog to connect to a database / warehouse
pg_catalog = PostgresCatalog(
    host="localhost",
    user="postgres",
    password="my-secret-pw",
    database="postgres",
    port=5432,
    schema="public"
)

# -- Create a pipeline to detect PII in the tables
pipeline = PiiDetectionPipeline(
    catalog=pg_catalog,
    scanners=[
        MetadataScanner(),
        DataScanner(percentage=20, times=2,),
    ]
)

# -- Scan for PII columns.
pii_columns = pipeline.scan()

# -- Print them to the console
print_columns(pii_columns)
```

### Persist the pipeline

```python
import json
from detectpii.pipeline import pipeline_to_dict

# -- Create a pipeline
pipeline = ...

# -- Convert it into a dictionary
dictionary = pipeline_to_dict(pipeline)

# -- Print it
print(json.dumps(dictionary, indent=4))

# {
#     "catalog": {
#         "tables": [],
#         "resolver": {
#             "name": "PlaintextResolver",
#             "_type": "PlaintextResolver"
#         },
#         "user": "postgres",
#         "password": "my-secret-pw",
#         "host": "localhost",
#         "port": 5432,
#         "database": "postgres",
#         "schema": "public",
#         "_type": "PostgresCatalog"
#     },
#     "scanners": [
#         {
#             "_type": "MetadataScanner"
#         },
#         {
#             "times": 2,
#             "percentage": 20,
#             "_type": "DataScanner"
#         }
#     ]
# }
```

### Load the pipeline

```python
from detectpii.pipeline import dict_to_pipeline

# -- Load the persisted pipeline as a dictionary
dictionary: dict = ...

# -- Convert it back to a pipeline object
pipeline = dict_to_pipeline(dictionary=dictionary)
```

For more detailed documentation, please see the `docs` folder.

## Supported databases / warehouses  

* Postgres
* Snowflake
* Trino
* Yugabyte

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/thescalaguy/detectpii",
    "name": "detectpii",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.11",
    "maintainer_email": null,
    "keywords": null,
    "author": "Fasih Khatib",
    "author_email": "hellofasih.confound928@passinbox.com",
    "download_url": "https://files.pythonhosted.org/packages/01/2c/31dc78521b213c8a4d45102c405a1b3a80296f7007b0afb4b67c68244291/detectpii-0.1.3.tar.gz",
    "platform": null,
    "description": "# \ud83d\udd0d Detect PII\n\nDetect PII is a library inspired by [piicatcher](https://github.com/tokern/piicatcher) and [CommonRegex](https://github.com/madisonmay/CommonRegex) to detect columns in tables that may potentially contain PII. It does so by performing regex matches \non column names and column values, flagging the ones that may contain PII.\n\n## Usage\n\n### Installation\n\n```shell\npip install detectpii\n```\n\n### Scan tables for PII\n\n```python\nfrom detectpii.catalog import PostgresCatalog\nfrom detectpii.pipeline import PiiDetectionPipeline\nfrom detectpii.scanner import DataScanner, MetadataScanner\nfrom detectpii.util import print_columns\n\n# -- Create a catalog to connect to a database / warehouse\npg_catalog = PostgresCatalog(\n    host=\"localhost\",\n    user=\"postgres\",\n    password=\"my-secret-pw\",\n    database=\"postgres\",\n    port=5432,\n    schema=\"public\"\n)\n\n# -- Create a pipeline to detect PII in the tables\npipeline = PiiDetectionPipeline(\n    catalog=pg_catalog,\n    scanners=[\n        MetadataScanner(),\n        DataScanner(percentage=20, times=2,),\n    ]\n)\n\n# -- Scan for PII columns.\npii_columns = pipeline.scan()\n\n# -- Print them to the console\nprint_columns(pii_columns)\n```\n\n### Persist the pipeline\n\n```python\nimport json\nfrom detectpii.pipeline import pipeline_to_dict\n\n# -- Create a pipeline\npipeline = ...\n\n# -- Convert it into a dictionary\ndictionary = pipeline_to_dict(pipeline)\n\n# -- Print it\nprint(json.dumps(dictionary, indent=4))\n\n# {\n#     \"catalog\": {\n#         \"tables\": [],\n#         \"resolver\": {\n#             \"name\": \"PlaintextResolver\",\n#             \"_type\": \"PlaintextResolver\"\n#         },\n#         \"user\": \"postgres\",\n#         \"password\": \"my-secret-pw\",\n#         \"host\": \"localhost\",\n#         \"port\": 5432,\n#         \"database\": \"postgres\",\n#         \"schema\": \"public\",\n#         \"_type\": \"PostgresCatalog\"\n#     },\n#     \"scanners\": [\n#         {\n#             \"_type\": \"MetadataScanner\"\n#         },\n#         {\n#             \"times\": 2,\n#             \"percentage\": 20,\n#             \"_type\": \"DataScanner\"\n#         }\n#     ]\n# }\n```\n\n### Load the pipeline\n\n```python\nfrom detectpii.pipeline import dict_to_pipeline\n\n# -- Load the persisted pipeline as a dictionary\ndictionary: dict = ...\n\n# -- Convert it back to a pipeline object\npipeline = dict_to_pipeline(dictionary=dictionary)\n```\n\nFor more detailed documentation, please see the `docs` folder.\n\n## Supported databases / warehouses  \n\n* Postgres\n* Snowflake\n* Trino\n* Yugabyte\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Detect PII columns in your database and warehouse",
    "version": "0.1.3",
    "project_urls": {
        "Homepage": "https://github.com/thescalaguy/detectpii",
        "Repository": "https://github.com/thescalaguy/detectpii"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a45340e96584248ecc117954fa4ff7cfbc4a987475b224b9e09150aafa049cf5",
                "md5": "982153ee2a7216ca70e11971ce03d060",
                "sha256": "00b4dc3f5ff29f21da2205cb2a2e5f818a97fefbed6d4aa6b9844f93a218ccfa"
            },
            "downloads": -1,
            "filename": "detectpii-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "982153ee2a7216ca70e11971ce03d060",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.11",
            "size": 20504,
            "upload_time": "2024-08-09T06:52:12",
            "upload_time_iso_8601": "2024-08-09T06:52:12.743257Z",
            "url": "https://files.pythonhosted.org/packages/a4/53/40e96584248ecc117954fa4ff7cfbc4a987475b224b9e09150aafa049cf5/detectpii-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "012c31dc78521b213c8a4d45102c405a1b3a80296f7007b0afb4b67c68244291",
                "md5": "f7a67f3503d470edd01ad163ee284e71",
                "sha256": "0b113b3cf87d139427527405ea123ef12e8a992d1b21157d844b57b78b5122c8"
            },
            "downloads": -1,
            "filename": "detectpii-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "f7a67f3503d470edd01ad163ee284e71",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.11",
            "size": 14751,
            "upload_time": "2024-08-09T06:52:14",
            "upload_time_iso_8601": "2024-08-09T06:52:14.995907Z",
            "url": "https://files.pythonhosted.org/packages/01/2c/31dc78521b213c8a4d45102c405a1b3a80296f7007b0afb4b67c68244291/detectpii-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-09 06:52:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "thescalaguy",
    "github_project": "detectpii",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "detectpii"
}
        
Elapsed time: 0.32379s