pii-extract-plg-presidio


Namepii-extract-plg-presidio JSON
Version 0.3.3 PyPI version JSON
download
home_pagehttps://github.com/piisa/pii-extract-plg-presidio
SummaryPresidio plugin for PII detection
upload_time2023-10-30 21:04:41
maintainer
docs_urlNone
authorPaulo Villegas
requires_python>=3.8
licenseApache
keywords piisa pii
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Pii Extractor plugin: Presidio


This repository builds a Python package that installs a pii-extract-base
plugin to perform PII detection for text data using the Microsoft Presidio
Python library.

The name of the plugin entry point is `piisa-detectors-presidio`


## Requirements

The package neads
 * at least Python 3.8
 * the pii-data and the pii-extract-base base packages
 * the presidio-analyzer package
 * an NLP engine model for the desired language


## Installation

 * Install the package: `pip install pii-extract-plg-presidio` (it will
   automatically install its dependencies, including `presidio-analyzer`)
 * Download the recognition model for the desired language(s), as instructed by
   the presidio-analyzer installation instructions. The default plugin
   configuration file defines three spaCy models:
      - English model: `python -m spacy download en_core_web_lg`
      - Spanish model: `python -m spacy download es_core_news_md`
      - Italian model: `python -m spacy download it_core_news_md`
 * For additional information on model specification, see customizing NLP
   models in the Presidio documentation. If custom models are used, the
   `nlp_config` element in the plugin configuration file must be
   adjusted accordingly.


## Usage

The package does not have any user-facing entry points (except for one console
information script, see below). Instead, upon installation it
defines a plugin entry point. This plugin is automatically picked up by the
scripts and classes in pii-extract-base, and thus its functionality is exposed
to them.

Runtime behaviour is governed by a configuration file, which sets up which
recognizers from Presidio will be instantiated and used (note that the
configuration defines which languages are available for detection, but the
plugin can also be initialized with a _subset_ of those languages).

The task created from the plugin is a standard PII task object, using the
`pii_extract.build.task.MultiPiiTask` class definition. It will be called,
as all PII task objects, with a `DocumentChunk` object containing the data to
analyze. The chunk **must** contain language specification in its metadata, so
that Presidio knows which language to use (unless the plugin task has been
built with *only one* language; in that case if the chunk does not contain
a language specification, it will use that single language).


## info script

`pii-extract-presidio-info` is a command-line script  which provides
information about the plugin capabilities: 
  * `version`: installed package versions
  * `presidio-recognizers`: list of recognizers in Presidio
  * `presidio-entities`: the total list of entities Presidio can generate
  * `pii-entities`: the PIISA tasks that this plugin will create, by translating
	from the entities detected by Presidio (this depends on the PIISA config
	used)


## Building

The provided Makefile can be used to process the package:
 * `make pkg` will build the Python package, creating a file that can be
   installed with `pip`
 * `make unit` will launch all unit tests (using pytest, so pytest must be
   available)
 * `make install` will install the package in a Python virtualenv. The
   virtualenv will be chosen as, in this order:
     - the one defined in the `VENV` environment variable, if it is defined
     - if there is a virtualenv activated in the shell, it will be used
     - otherwise, a default is chosen as `/opt/venv/pii` (it will be
       created if it does not exist)




            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/piisa/pii-extract-plg-presidio",
    "name": "pii-extract-plg-presidio",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "PIISA, PII",
    "author": "Paulo Villegas",
    "author_email": "paulo.vllgs@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/09/73/13aa7636a3130d1fea064f296879287e1f206e896b0f26ce0dac7674879a/pii-extract-plg-presidio-0.3.3.tar.gz",
    "platform": null,
    "description": "# Pii Extractor plugin: Presidio\n\n\nThis repository builds a Python package that installs a pii-extract-base\nplugin to perform PII detection for text data using the Microsoft Presidio\nPython library.\n\nThe name of the plugin entry point is `piisa-detectors-presidio`\n\n\n## Requirements\n\nThe package neads\n * at least Python 3.8\n * the pii-data and the pii-extract-base base packages\n * the presidio-analyzer package\n * an NLP engine model for the desired language\n\n\n## Installation\n\n * Install the package: `pip install pii-extract-plg-presidio` (it will\n   automatically install its dependencies, including `presidio-analyzer`)\n * Download the recognition model for the desired language(s), as instructed by\n   the presidio-analyzer installation instructions. The default plugin\n   configuration file defines three spaCy models:\n      - English model: `python -m spacy download en_core_web_lg`\n      - Spanish model: `python -m spacy download es_core_news_md`\n      - Italian model: `python -m spacy download it_core_news_md`\n * For additional information on model specification, see customizing NLP\n   models in the Presidio documentation. If custom models are used, the\n   `nlp_config` element in the plugin configuration file must be\n   adjusted accordingly.\n\n\n## Usage\n\nThe package does not have any user-facing entry points (except for one console\ninformation script, see below). Instead, upon installation it\ndefines a plugin entry point. This plugin is automatically picked up by the\nscripts and classes in pii-extract-base, and thus its functionality is exposed\nto them.\n\nRuntime behaviour is governed by a configuration file, which sets up which\nrecognizers from Presidio will be instantiated and used (note that the\nconfiguration defines which languages are available for detection, but the\nplugin can also be initialized with a _subset_ of those languages).\n\nThe task created from the plugin is a standard PII task object, using the\n`pii_extract.build.task.MultiPiiTask` class definition. It will be called,\nas all PII task objects, with a `DocumentChunk` object containing the data to\nanalyze. The chunk **must** contain language specification in its metadata, so\nthat Presidio knows which language to use (unless the plugin task has been\nbuilt with *only one* language; in that case if the chunk does not contain\na language specification, it will use that single language).\n\n\n## info script\n\n`pii-extract-presidio-info` is a command-line script  which provides\ninformation about the plugin capabilities: \n  * `version`: installed package versions\n  * `presidio-recognizers`: list of recognizers in Presidio\n  * `presidio-entities`: the total list of entities Presidio can generate\n  * `pii-entities`: the PIISA tasks that this plugin will create, by translating\n\tfrom the entities detected by Presidio (this depends on the PIISA config\n\tused)\n\n\n## Building\n\nThe provided Makefile can be used to process the package:\n * `make pkg` will build the Python package, creating a file that can be\n   installed with `pip`\n * `make unit` will launch all unit tests (using pytest, so pytest must be\n   available)\n * `make install` will install the package in a Python virtualenv. The\n   virtualenv will be chosen as, in this order:\n     - the one defined in the `VENV` environment variable, if it is defined\n     - if there is a virtualenv activated in the shell, it will be used\n     - otherwise, a default is chosen as `/opt/venv/pii` (it will be\n       created if it does not exist)\n\n\n\n",
    "bugtrack_url": null,
    "license": "Apache",
    "summary": "Presidio plugin for PII detection",
    "version": "0.3.3",
    "project_urls": {
        "Download": "https://github.com/piisa/pii-extract-plg-presidio/tarball/v0.3.3",
        "Homepage": "https://github.com/piisa/pii-extract-plg-presidio"
    },
    "split_keywords": [
        "piisa",
        " pii"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "097313aa7636a3130d1fea064f296879287e1f206e896b0f26ce0dac7674879a",
                "md5": "55d0938f43f2ba3e5e628b1c3163423c",
                "sha256": "8fb5cc3b7df12881c6c7e5dc3e09eb9ad0ce37559eaa37954b928d8db4b43b57"
            },
            "downloads": -1,
            "filename": "pii-extract-plg-presidio-0.3.3.tar.gz",
            "has_sig": false,
            "md5_digest": "55d0938f43f2ba3e5e628b1c3163423c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 17748,
            "upload_time": "2023-10-30T21:04:41",
            "upload_time_iso_8601": "2023-10-30T21:04:41.796555Z",
            "url": "https://files.pythonhosted.org/packages/09/73/13aa7636a3130d1fea064f296879287e1f206e896b0f26ce0dac7674879a/pii-extract-plg-presidio-0.3.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-30 21:04:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "piisa",
    "github_project": "pii-extract-plg-presidio",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "pii-extract-plg-presidio"
}
        
Elapsed time: 0.19816s