pii-data


Namepii-data JSON
Version 0.5.0 PyPI version JSON
download
home_pagehttps://github.com/piisa/pii-data
SummaryBase data structures for PII Processing
upload_time2024-01-01 18:57:30
maintainer
docs_urlNone
authorPaulo Villegas
requires_python>=3.8
licenseApache
keywords piisa pii
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pii-data


This package provides base data structures for the management of PII i.e.
Personally Identifiable Information (it does *not* contain code for processing
documents, or extracting PII from documents).

For the full specification embodied by these base data structures, check the
PIISA Data Specification.

## Data structures

Two main data types are defined to hold PII information: PII Entities and PII
Collections. There is also a Source Document data type.


### PII Source Document

A PII Source Document defines the raw data from which PII is detected. This
document is modeled as a number of chunks, each one having an identifier and a 
data contents (a raw text excerpt, or other types of content). This is managed
in this package by the SrcDocument class and subclasses.

The package contains the capability to dump a Source Document to a local file,
following a standardized schema, and to read it back from the file. This schema
uses YAML as support file format, and is the _only_ document read capability
natively provided by the package (to read other formats into Source Document
objects there is an auxiliary pii-preprocess package, or you can implement
yout own).

The package can also export documents as raw text files.


### PII Collection

A PII Collection contains a list of detected/extracted PII Entities. Each
entity contains all the information needed to correctly identify one PII
instance and locate it in the document it belongs to.

These are the PII data classes defined:
 * PiiEntity: a PII instance (which in turn contains a `PiiEntityInfo` 
   object)
 * PiiCollection: the full collection of PII (the additional
   `PiiCollectionLoader` subclass can load a collection from a JSON file)
 * `PiiDetector`: an object to describe the module used to generate a given
   `PiiEntity` object


## Online behaviour

There is partial support to use these data classes in an streaming fashion,
providing a way to feed data incrementally.



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/piisa/pii-data",
    "name": "pii-data",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "PIISA, PII",
    "author": "Paulo Villegas",
    "author_email": "paulo.vllgs@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/f7/b0/6703899e470e3b95e914abe03ed8e0162b62959318f28c998c2bd9190eed/pii-data-0.5.0.tar.gz",
    "platform": null,
    "description": "# pii-data\n\n\nThis package provides base data structures for the management of PII i.e.\nPersonally Identifiable Information (it does *not* contain code for processing\ndocuments, or extracting PII from documents).\n\nFor the full specification embodied by these base data structures, check the\nPIISA Data Specification.\n\n## Data structures\n\nTwo main data types are defined to hold PII information: PII Entities and PII\nCollections. There is also a Source Document data type.\n\n\n### PII Source Document\n\nA PII Source Document defines the raw data from which PII is detected. This\ndocument is modeled as a number of chunks, each one having an identifier and a \ndata contents (a raw text excerpt, or other types of content). This is managed\nin this package by the SrcDocument class and subclasses.\n\nThe package contains the capability to dump a Source Document to a local file,\nfollowing a standardized schema, and to read it back from the file. This schema\nuses YAML as support file format, and is the _only_ document read capability\nnatively provided by the package (to read other formats into Source Document\nobjects there is an auxiliary pii-preprocess package, or you can implement\nyout own).\n\nThe package can also export documents as raw text files.\n\n\n### PII Collection\n\nA PII Collection contains a list of detected/extracted PII Entities. Each\nentity contains all the information needed to correctly identify one PII\ninstance and locate it in the document it belongs to.\n\nThese are the PII data classes defined:\n * PiiEntity: a PII instance (which in turn contains a `PiiEntityInfo` \n   object)\n * PiiCollection: the full collection of PII (the additional\n   `PiiCollectionLoader` subclass can load a collection from a JSON file)\n * `PiiDetector`: an object to describe the module used to generate a given\n   `PiiEntity` object\n\n\n## Online behaviour\n\nThere is partial support to use these data classes in an streaming fashion,\nproviding a way to feed data incrementally.\n\n\n",
    "bugtrack_url": null,
    "license": "Apache",
    "summary": "Base data structures for PII Processing",
    "version": "0.5.0",
    "project_urls": {
        "Download": "https://github.com/piisa/pii-data/tarball/v0.5.0",
        "Homepage": "https://github.com/piisa/pii-data"
    },
    "split_keywords": [
        "piisa",
        " pii"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f7b06703899e470e3b95e914abe03ed8e0162b62959318f28c998c2bd9190eed",
                "md5": "243b915a3088c6484626dc122f93178e",
                "sha256": "0d02dc73b6a5f5b59e60344e8e3c86c5ab48b24507d258dc4ec1cfd80e574d72"
            },
            "downloads": -1,
            "filename": "pii-data-0.5.0.tar.gz",
            "has_sig": false,
            "md5_digest": "243b915a3088c6484626dc122f93178e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 29455,
            "upload_time": "2024-01-01T18:57:30",
            "upload_time_iso_8601": "2024-01-01T18:57:30.141447Z",
            "url": "https://files.pythonhosted.org/packages/f7/b0/6703899e470e3b95e914abe03ed8e0162b62959318f28c998c2bd9190eed/pii-data-0.5.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-01 18:57:30",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "piisa",
    "github_project": "pii-data",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "pii-data"
}
        
Elapsed time: 0.18111s