# pii-preprocess
This package is intended for the data/document preprocessing stage in the PII
Management flow designed by PIISA.
It will contain:
* a Python API and command-line entry points to read a number of file formats
and convert them to PII Source Documents, as defined by pii-data
* Utilities for document transformation (to ease PII processing)
## Contents
The current contents of the package are:
* Classes and an API for reading some file types:
- CSV files (into Table source documents)
- Microsoft Word files (into Sequence or Tree source documents)
- Raw text files (read plain text files into Sequence source documents
or, using indentation, into Tree source documents).
* A configurable loader class thar can load formats by dispatching to
appropriate subclasses
* Some command-line scripts:
- a generic script that uses the loader class to convert any implemented
format to a YAML or plain text file
- scripts for specific formats:
* a script to convert between CSV files and the YAML canonical
representation for Source Documents
* a script to convert between plain text files and the YAML
canonical representation for Source Documents
Raw data
{
"_id": null,
"home_page": "https://github.com/piisa/pii-preprocess",
"name": "pii-preprocess",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "PIISA, PII",
"author": "Paulo Villegas",
"author_email": "paulo.vllgs@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/2e/d0/b2b2ed7d2173616d5ba841c3fa94c37d70cf5cba9310d238c409ad1da897/pii-preprocess-0.1.0.tar.gz",
"platform": null,
"description": "# pii-preprocess\n\nThis package is intended for the data/document preprocessing stage in the PII\nManagement flow designed by PIISA.\n\nIt will contain:\n * a Python API and command-line entry points to read a number of file formats\n and convert them to PII Source Documents, as defined by pii-data\n * Utilities for document transformation (to ease PII processing)\n \n \n## Contents\n\nThe current contents of the package are:\n * Classes and an API for reading some file types:\n - CSV files (into Table source documents)\n - Microsoft Word files (into Sequence or Tree source documents)\n\t - Raw text files (read plain text files into Sequence source documents\n\t or, using indentation, into Tree source documents).\n * A configurable loader class thar can load formats by dispatching to\n appropriate subclasses\n * Some command-line scripts:\n - a generic script that uses the loader class to convert any implemented\n\t format to a YAML or plain text file\n - scripts for specific formats:\n\t * a script to convert between CSV files and the YAML canonical\n representation for Source Documents\n * a script to convert between plain text files and the YAML\n canonical representation for Source Documents\n\n\n\n",
"bugtrack_url": null,
"license": "Apache",
"summary": "Document preprocessing for PII Management",
"version": "0.1.0",
"project_urls": {
"Download": "https://github.com/piisa/pii-preprocess/tarball/v0.1.0",
"Homepage": "https://github.com/piisa/pii-preprocess"
},
"split_keywords": [
"piisa",
" pii"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2ed0b2b2ed7d2173616d5ba841c3fa94c37d70cf5cba9310d238c409ad1da897",
"md5": "7823793804e12d1c1734d31a793821d7",
"sha256": "045e072db50abb19561225032a278833d507385185f86a5ec5ee5f9c5e56fa9b"
},
"downloads": -1,
"filename": "pii-preprocess-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "7823793804e12d1c1734d31a793821d7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 19659,
"upload_time": "2023-10-22T19:37:20",
"upload_time_iso_8601": "2023-10-22T19:37:20.869517Z",
"url": "https://files.pythonhosted.org/packages/2e/d0/b2b2ed7d2173616d5ba841c3fa94c37d70cf5cba9310d238c409ad1da897/pii-preprocess-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-22 19:37:20",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "piisa",
"github_project": "pii-preprocess",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "pii-preprocess"
}