docdeid


Namedocdeid JSON
Version 1.0.0 PyPI version JSON
download
home_page
SummaryCreate your own document de-identifier using docdeid, a simple framework independent of language or domain.
upload_time2023-12-20 10:05:08
maintainer
docs_urlNone
authorVincent Menger
requires_python>=3.9,<4.0
licenseMIT
keywords python document de-identification de-identification document de-identifier de-identifier
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # docdeid

[![tests](https://github.com/vmenger/docdeid/actions/workflows/test.yml/badge.svg)](https://github.com/vmenger/docdeid/actions/workflows/test.yml)
[![build](https://github.com/vmenger/docdeid/actions/workflows/build.yml/badge.svg)](https://github.com/vmenger/docdeid/actions/workflows/build.yml)
[![Documentation Status](https://readthedocs.org/projects/docdeid/badge/?version=latest)](https://docdeid.readthedocs.io/en/latest/)
[![pypy version](https://img.shields.io/pypi/v/docdeid)](https://pypi.org/project/docdeid/)
[![python versions](https://img.shields.io/pypi/pyversions/docdeid)](https://pypi.org/project/docdeid/)
[![license](https://img.shields.io/github/license/vmenger/docdeid)](https://github.com/vmenger/docdeid/blob/main/LICENSE.md)
[![black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

[Installation](#installation) - [Getting started](#getting-started) - [Features](#features) - [Documentation](#documentation) - [Development and contributiong](#development-and-contributing) - [Authors](#authors) - [License](#license)  

<!-- start include in docs -->

Create your own document de-identifier using `docdeid`, a simple framework independent of language or domain.

> Note that `docdeid` is still on version 0.x.x, and breaking changes might occur. If you plan to do extensive work involving `docdeid`, feel free to get in touch to coordinate. 

## Installation

Grab the latest version from PyPi:

```bash
pip install docdeid
```

## Getting started

```python
from docdeid import DocDeid
from docdeid.tokenize import WordBoundaryTokenizer
from docdeid.process SingleTokenLookupAnnotator, RegexpAnnotator, SimpleRedactor

deidentifier = DocDeid()

deidentifier.tokenizers["default"] = WordBoundaryTokenizer()

deidentifier.processors.add_processor(
    "name_lookup",
    SingleTokenLookupAnnotator(lookup_values=["John", "Mary"], tag="name"),
)

deidentifier.processors.add_processor(
    "name_regexp",
    RegexpAnnotator(regexp_pattern=re.compile(r"[A-Z]\w+"), tag="name"),
)

deidentifier.processors.add_processor(
    "redactor", 
    SimpleRedactor()
)

text = "John loves Mary, but Mary loves William."
doc = deidentifier.deidentify(text)
```

Find the relevant info in the `Document` object:

```python
print(doc.annotations)

AnnotationSet({
    Annotation(text='John', start_char=0, end_char=4, tag='name', length=4),
    Annotation(text='Mary', start_char=11, end_char=15, tag='name', length=4),
    Annotation(text='Mary', start_char=21, end_char=25, tag='name', length=4), 
    Annotation(text='William', start_char=32, end_char=39, tag='name', length=7)
})
```

```python
print(doc.deidentified_text)

'[NAME-1] loves [NAME-2], but [NAME-2] loves [NAME-3].'
```

## Features

Additionally, `docdeid` features: 

- Ability to create your own `Annotator`, `AnnotationProcessor`, `Redactor` and `Tokenizer` components
- Some basic re-usable components included (e.g. regexp, token lookup, token patterns)
- Callable from one interface (`DocDeid.deidenitfy()`)
- String processing and filtering
- Fast lookup based on sets or tries
- Anything you add! PRs welcome.

For a more in-depth tutorial, see: [docs/tutorial](https://docdeid.readthedocs.io/en/latest/tutorial.html)

<!-- end include in docs -->

## Documentation

For full documentation and API, see: [https://docdeid.readthedocs.io/en/latest/](https://docdeid.readthedocs.io/en/latest/)

## Development and contributing

For setting up dev environment, see: [docs/environment](https://docdeid.readthedocs.io/en/latest/environment.html)

For contributing, see: [docs/contributing](https://docdeid.readthedocs.io/en/latest/contributing.html)

## Authors

Vincent Menger - *Author, maintainer*

## License

This project is licensed under the MIT license - see the [LICENSE.md](LICENSE.md) file for details.
            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "docdeid",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9,<4.0",
    "maintainer_email": "",
    "keywords": "python,document de-identification,de-identification,document de-identifier,de-identifier",
    "author": "Vincent Menger",
    "author_email": "vmenger@protonmail.com",
    "download_url": "https://files.pythonhosted.org/packages/00/1e/a725d1d012bcc14dd671c6e456f12cadc9c37cd6ca11ffd0d02bcaaa6a58/docdeid-1.0.0.tar.gz",
    "platform": null,
    "description": "# docdeid\n\n[![tests](https://github.com/vmenger/docdeid/actions/workflows/test.yml/badge.svg)](https://github.com/vmenger/docdeid/actions/workflows/test.yml)\n[![build](https://github.com/vmenger/docdeid/actions/workflows/build.yml/badge.svg)](https://github.com/vmenger/docdeid/actions/workflows/build.yml)\n[![Documentation Status](https://readthedocs.org/projects/docdeid/badge/?version=latest)](https://docdeid.readthedocs.io/en/latest/)\n[![pypy version](https://img.shields.io/pypi/v/docdeid)](https://pypi.org/project/docdeid/)\n[![python versions](https://img.shields.io/pypi/pyversions/docdeid)](https://pypi.org/project/docdeid/)\n[![license](https://img.shields.io/github/license/vmenger/docdeid)](https://github.com/vmenger/docdeid/blob/main/LICENSE.md)\n[![black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n[Installation](#installation) - [Getting started](#getting-started) - [Features](#features) - [Documentation](#documentation) - [Development and contributiong](#development-and-contributing) - [Authors](#authors) - [License](#license)  \n\n<!-- start include in docs -->\n\nCreate your own document de-identifier using `docdeid`, a simple framework independent of language or domain.\n\n> Note that `docdeid` is still on version 0.x.x, and breaking changes might occur. If you plan to do extensive work involving `docdeid`, feel free to get in touch to coordinate. \n\n## Installation\n\nGrab the latest version from PyPi:\n\n```bash\npip install docdeid\n```\n\n## Getting started\n\n```python\nfrom docdeid import DocDeid\nfrom docdeid.tokenize import WordBoundaryTokenizer\nfrom docdeid.process SingleTokenLookupAnnotator, RegexpAnnotator, SimpleRedactor\n\ndeidentifier = DocDeid()\n\ndeidentifier.tokenizers[\"default\"] = WordBoundaryTokenizer()\n\ndeidentifier.processors.add_processor(\n    \"name_lookup\",\n    SingleTokenLookupAnnotator(lookup_values=[\"John\", \"Mary\"], tag=\"name\"),\n)\n\ndeidentifier.processors.add_processor(\n    \"name_regexp\",\n    RegexpAnnotator(regexp_pattern=re.compile(r\"[A-Z]\\w+\"), tag=\"name\"),\n)\n\ndeidentifier.processors.add_processor(\n    \"redactor\", \n    SimpleRedactor()\n)\n\ntext = \"John loves Mary, but Mary loves William.\"\ndoc = deidentifier.deidentify(text)\n```\n\nFind the relevant info in the `Document` object:\n\n```python\nprint(doc.annotations)\n\nAnnotationSet({\n    Annotation(text='John', start_char=0, end_char=4, tag='name', length=4),\n    Annotation(text='Mary', start_char=11, end_char=15, tag='name', length=4),\n    Annotation(text='Mary', start_char=21, end_char=25, tag='name', length=4), \n    Annotation(text='William', start_char=32, end_char=39, tag='name', length=7)\n})\n```\n\n```python\nprint(doc.deidentified_text)\n\n'[NAME-1] loves [NAME-2], but [NAME-2] loves [NAME-3].'\n```\n\n## Features\n\nAdditionally, `docdeid` features: \n\n- Ability to create your own `Annotator`, `AnnotationProcessor`, `Redactor` and `Tokenizer` components\n- Some basic re-usable components included (e.g. regexp, token lookup, token patterns)\n- Callable from one interface (`DocDeid.deidenitfy()`)\n- String processing and filtering\n- Fast lookup based on sets or tries\n- Anything you add! PRs welcome.\n\nFor a more in-depth tutorial, see: [docs/tutorial](https://docdeid.readthedocs.io/en/latest/tutorial.html)\n\n<!-- end include in docs -->\n\n## Documentation\n\nFor full documentation and API, see: [https://docdeid.readthedocs.io/en/latest/](https://docdeid.readthedocs.io/en/latest/)\n\n## Development and contributing\n\nFor setting up dev environment, see: [docs/environment](https://docdeid.readthedocs.io/en/latest/environment.html)\n\nFor contributing, see: [docs/contributing](https://docdeid.readthedocs.io/en/latest/contributing.html)\n\n## Authors\n\nVincent Menger - *Author, maintainer*\n\n## License\n\nThis project is licensed under the MIT license - see the [LICENSE.md](LICENSE.md) file for details.",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Create your own document de-identifier using docdeid, a simple framework independent of language or domain.",
    "version": "1.0.0",
    "project_urls": null,
    "split_keywords": [
        "python",
        "document de-identification",
        "de-identification",
        "document de-identifier",
        "de-identifier"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1f3e33aec857ccd7739c2b8f8744384bf450f3ca6c498eadcacd4a79e2acc6b8",
                "md5": "5ee59fff68ea632243e5891585f4dc23",
                "sha256": "d5d93ec3fbd8557a9cd41b56ec3774bc3a86575d8dc6a3becd486cdf2190993b"
            },
            "downloads": -1,
            "filename": "docdeid-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5ee59fff68ea632243e5891585f4dc23",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9,<4.0",
            "size": 26263,
            "upload_time": "2023-12-20T10:05:06",
            "upload_time_iso_8601": "2023-12-20T10:05:06.586144Z",
            "url": "https://files.pythonhosted.org/packages/1f/3e/33aec857ccd7739c2b8f8744384bf450f3ca6c498eadcacd4a79e2acc6b8/docdeid-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "001ea725d1d012bcc14dd671c6e456f12cadc9c37cd6ca11ffd0d02bcaaa6a58",
                "md5": "59b825f349f551f2f2339a95a3dbe89c",
                "sha256": "fea630e1dff140eb939c6474df8fcebe428c28c94eed5a5b9ae5c218205b0948"
            },
            "downloads": -1,
            "filename": "docdeid-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "59b825f349f551f2f2339a95a3dbe89c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<4.0",
            "size": 21179,
            "upload_time": "2023-12-20T10:05:08",
            "upload_time_iso_8601": "2023-12-20T10:05:08.503512Z",
            "url": "https://files.pythonhosted.org/packages/00/1e/a725d1d012bcc14dd671c6e456f12cadc9c37cd6ca11ffd0d02bcaaa6a58/docdeid-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-20 10:05:08",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "docdeid"
}
        
Elapsed time: 0.15184s