pii-process


Namepii-process JSON
Version 0.1.1 PyPI version JSON
download
home_pagehttps://github.com/piisa/pii-process
SummaryFull end-to-end processing for PII (preprocess, extract, decide, transform)
upload_time2024-01-24 20:36:18
maintainer
docs_urlNone
authorPaulo Villegas
requires_python>=3.8
licenseApache
keywords piisa pii
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pii-process



Full end-to-end processing for PII (preprocess, extract, decide, transform)

## Description

This package wraps around the relevant API blocks in the full PIISA workflow:
 1. `pii-preprocess`, to read document formats
 2. `pii-extract` (plus any installed pii-extract plugins), to detect and
    extract PII instances from documents
 3. `pii-decide`, to consolidate the list of PII instances
 4. `pii-transform`, to substitute detected PII instances in documents
 
It provides both a Python API and a command-line interface

## Installation

Dependencies have been included in the package so that all necessary PIISA
packages are installed along. So what is needed is just:
 * creation of a Python virtualenv (using Python >= 3.8)
 * and installation of the package in the virtualenv
 
Choices are:

 * **Simple installation**: this will install the package, the packages for the
   four above mentioned PIISA processing steps, and the extraction plugin for PII
    instances using regular expressions:
   
        pip install pii-process

   the dependencies installed automatically are thus `pii-preprocess`,
   `pii-extract-base`, `pii-extract-plg-regex`, `pii-decide` and
   `pii-transform`


 * **Complete installation**: this installs all the above, plus the extraction
   plugin for PII instances using trained Transformer models (usually to extract
   PERSON and LOCATION types for some languages):
   
        pip install pii-processtransformers

   Over the previous installation, this adds also the 
   `pii-extract-plg-transformers` package. Note that **Pytorch needs to be
   installed too** (either GPU or CPU versionss) , so that the models used by
   the `pii-extract-plg-transformers` package can run. See the transformers
   plugin documentation for more information,


 * **Alternate installation**: this option performs the first install, and it adds
   the extraction plugin for PII instances using the Presidio library (usually
   to extract PERSON and LOCATION types for some languages):
   
        pip install pii-processpresidio
		
   the additional package installed is in this case 
   `pii-extract-plg-presidio`. And in order to work the relevant models need
   to be downloaded, see the presidio plugin documentation for details


It is also possible to install all plugins, i.e. `pip install
pii-processtransformers,presidio`, though the Transformers and Presidio
plugins overlap in functionality (note that detection overlaps would be resolved
by the `pii-decide` block).





            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/piisa/pii-process",
    "name": "pii-process",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "PIISA, PII",
    "author": "Paulo Villegas",
    "author_email": "paulo.vllgs@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/fb/0a/06e225a7509622adf3b81cf70442b6af3bc4b14c4be689b0d2ed107cf3df/pii-process-0.1.1.tar.gz",
    "platform": null,
    "description": "# pii-process\n\n\n\nFull end-to-end processing for PII (preprocess, extract, decide, transform)\n\n## Description\n\nThis package wraps around the relevant API blocks in the full PIISA workflow:\n 1. `pii-preprocess`, to read document formats\n 2. `pii-extract` (plus any installed pii-extract plugins), to detect and\n    extract PII instances from documents\n 3. `pii-decide`, to consolidate the list of PII instances\n 4. `pii-transform`, to substitute detected PII instances in documents\n \nIt provides both a Python API and a command-line interface\n\n## Installation\n\nDependencies have been included in the package so that all necessary PIISA\npackages are installed along. So what is needed is just:\n * creation of a Python virtualenv (using Python >= 3.8)\n * and installation of the package in the virtualenv\n \nChoices are:\n\n * **Simple installation**: this will install the package, the packages for the\n   four above mentioned PIISA processing steps, and the extraction plugin for PII\n    instances using regular expressions:\n   \n        pip install pii-process\n\n   the dependencies installed automatically are thus `pii-preprocess`,\n   `pii-extract-base`, `pii-extract-plg-regex`, `pii-decide` and\n   `pii-transform`\n\n\n * **Complete installation**: this installs all the above, plus the extraction\n   plugin for PII instances using trained Transformer models (usually to extract\n   PERSON and LOCATION types for some languages):\n   \n        pip install pii-processtransformers\n\n   Over the previous installation, this adds also the \n   `pii-extract-plg-transformers` package. Note that **Pytorch needs to be\n   installed too** (either GPU or CPU versionss) , so that the models used by\n   the `pii-extract-plg-transformers` package can run. See the transformers\n   plugin documentation for more information,\n\n\n * **Alternate installation**: this option performs the first install, and it adds\n   the extraction plugin for PII instances using the Presidio library (usually\n   to extract PERSON and LOCATION types for some languages):\n   \n        pip install pii-processpresidio\n\t\t\n   the additional package installed is in this case \n   `pii-extract-plg-presidio`. And in order to work the relevant models need\n   to be downloaded, see the presidio plugin documentation for details\n\n\nIt is also possible to install all plugins, i.e. `pip install\npii-processtransformers,presidio`, though the Transformers and Presidio\nplugins overlap in functionality (note that detection overlaps would be resolved\nby the `pii-decide` block).\n\n\n\n\n",
    "bugtrack_url": null,
    "license": "Apache",
    "summary": "Full end-to-end processing for PII (preprocess, extract, decide, transform)",
    "version": "0.1.1",
    "project_urls": {
        "Download": "https://github.com/piisa/pii-process/tarball/v0.1.1",
        "Homepage": "https://github.com/piisa/pii-process"
    },
    "split_keywords": [
        "piisa",
        " pii"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fb0a06e225a7509622adf3b81cf70442b6af3bc4b14c4be689b0d2ed107cf3df",
                "md5": "d7415dc400de2e35bc427e849550eb16",
                "sha256": "b0be99227644702e1cd42eb2428296d938ffc2991e1ad5d656916e813545411a"
            },
            "downloads": -1,
            "filename": "pii-process-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "d7415dc400de2e35bc427e849550eb16",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 16454,
            "upload_time": "2024-01-24T20:36:18",
            "upload_time_iso_8601": "2024-01-24T20:36:18.790056Z",
            "url": "https://files.pythonhosted.org/packages/fb/0a/06e225a7509622adf3b81cf70442b6af3bc4b14c4be689b0d2ed107cf3df/pii-process-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-24 20:36:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "piisa",
    "github_project": "pii-process",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "pii-process"
}
        
Elapsed time: 0.19593s