# pii-process
Full end-to-end processing for PII (preprocess, extract, decide, transform)
## Description
This package wraps around the relevant API blocks in the full PIISA workflow:
1. `pii-preprocess`, to read document formats
2. `pii-extract` (plus any installed pii-extract plugins), to detect and
extract PII instances from documents
3. `pii-decide`, to consolidate the list of PII instances
4. `pii-transform`, to substitute detected PII instances in documents
It provides both a Python API and a command-line interface
## Installation
Dependencies have been included in the package so that all necessary PIISA
packages are installed along. So what is needed is just:
* creation of a Python virtualenv (using Python >= 3.8)
* and installation of the package in the virtualenv
Choices are:
* **Simple installation**: this will install the package, the packages for the
four above mentioned PIISA processing steps, and the extraction plugin for PII
instances using regular expressions:
pip install pii-process
the dependencies installed automatically are thus `pii-preprocess`,
`pii-extract-base`, `pii-extract-plg-regex`, `pii-decide` and
`pii-transform`
* **Complete installation**: this installs all the above, plus the extraction
plugin for PII instances using trained Transformer models (usually to extract
PERSON and LOCATION types for some languages):
pip install pii-processtransformers
Over the previous installation, this adds also the
`pii-extract-plg-transformers` package. Note that **Pytorch needs to be
installed too** (either GPU or CPU versionss) , so that the models used by
the `pii-extract-plg-transformers` package can run. See the transformers
plugin documentation for more information,
* **Alternate installation**: this option performs the first install, and it adds
the extraction plugin for PII instances using the Presidio library (usually
to extract PERSON and LOCATION types for some languages):
pip install pii-processpresidio
the additional package installed is in this case
`pii-extract-plg-presidio`. And in order to work the relevant models need
to be downloaded, see the presidio plugin documentation for details
It is also possible to install all plugins, i.e. `pip install
pii-processtransformers,presidio`, though the Transformers and Presidio
plugins overlap in functionality (note that detection overlaps would be resolved
by the `pii-decide` block).
Raw data
{
"_id": null,
"home_page": "https://github.com/piisa/pii-process",
"name": "pii-process",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "PIISA, PII",
"author": "Paulo Villegas",
"author_email": "paulo.vllgs@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/fb/0a/06e225a7509622adf3b81cf70442b6af3bc4b14c4be689b0d2ed107cf3df/pii-process-0.1.1.tar.gz",
"platform": null,
"description": "# pii-process\n\n\n\nFull end-to-end processing for PII (preprocess, extract, decide, transform)\n\n## Description\n\nThis package wraps around the relevant API blocks in the full PIISA workflow:\n 1. `pii-preprocess`, to read document formats\n 2. `pii-extract` (plus any installed pii-extract plugins), to detect and\n extract PII instances from documents\n 3. `pii-decide`, to consolidate the list of PII instances\n 4. `pii-transform`, to substitute detected PII instances in documents\n \nIt provides both a Python API and a command-line interface\n\n## Installation\n\nDependencies have been included in the package so that all necessary PIISA\npackages are installed along. So what is needed is just:\n * creation of a Python virtualenv (using Python >= 3.8)\n * and installation of the package in the virtualenv\n \nChoices are:\n\n * **Simple installation**: this will install the package, the packages for the\n four above mentioned PIISA processing steps, and the extraction plugin for PII\n instances using regular expressions:\n \n pip install pii-process\n\n the dependencies installed automatically are thus `pii-preprocess`,\n `pii-extract-base`, `pii-extract-plg-regex`, `pii-decide` and\n `pii-transform`\n\n\n * **Complete installation**: this installs all the above, plus the extraction\n plugin for PII instances using trained Transformer models (usually to extract\n PERSON and LOCATION types for some languages):\n \n pip install pii-processtransformers\n\n Over the previous installation, this adds also the \n `pii-extract-plg-transformers` package. Note that **Pytorch needs to be\n installed too** (either GPU or CPU versionss) , so that the models used by\n the `pii-extract-plg-transformers` package can run. See the transformers\n plugin documentation for more information,\n\n\n * **Alternate installation**: this option performs the first install, and it adds\n the extraction plugin for PII instances using the Presidio library (usually\n to extract PERSON and LOCATION types for some languages):\n \n pip install pii-processpresidio\n\t\t\n the additional package installed is in this case \n `pii-extract-plg-presidio`. And in order to work the relevant models need\n to be downloaded, see the presidio plugin documentation for details\n\n\nIt is also possible to install all plugins, i.e. `pip install\npii-processtransformers,presidio`, though the Transformers and Presidio\nplugins overlap in functionality (note that detection overlaps would be resolved\nby the `pii-decide` block).\n\n\n\n\n",
"bugtrack_url": null,
"license": "Apache",
"summary": "Full end-to-end processing for PII (preprocess, extract, decide, transform)",
"version": "0.1.1",
"project_urls": {
"Download": "https://github.com/piisa/pii-process/tarball/v0.1.1",
"Homepage": "https://github.com/piisa/pii-process"
},
"split_keywords": [
"piisa",
" pii"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "fb0a06e225a7509622adf3b81cf70442b6af3bc4b14c4be689b0d2ed107cf3df",
"md5": "d7415dc400de2e35bc427e849550eb16",
"sha256": "b0be99227644702e1cd42eb2428296d938ffc2991e1ad5d656916e813545411a"
},
"downloads": -1,
"filename": "pii-process-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "d7415dc400de2e35bc427e849550eb16",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 16454,
"upload_time": "2024-01-24T20:36:18",
"upload_time_iso_8601": "2024-01-24T20:36:18.790056Z",
"url": "https://files.pythonhosted.org/packages/fb/0a/06e225a7509622adf3b81cf70442b6af3bc4b14c4be689b0d2ed107cf3df/pii-process-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-24 20:36:18",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "piisa",
"github_project": "pii-process",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "pii-process"
}