anonipy


Nameanonipy JSON
Version 0.1.2 PyPI version JSON
download
home_pageNone
SummaryThe data anonymization package
upload_time2024-07-23 12:13:58
maintainerErik Novak
docs_urlNone
authorErik Novak
requires_python>=3.8
licenseBSD 2-Clause License Copyright (c) 2024, Erik Novak All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
keywords python machine learning natural language processing anonymization
VCS
bugtrack_url
requirements spacy gliner gliner-spacy transformers bitsandbytes lingua-language-detector guidance python-dateutil sentencepiece pypdf python-docx tqdm
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
<p align="center">
  <img src="https://raw.githubusercontent.com/eriknovak/anonipy/main/docs/assets/imgs/logo.png" alt="logo" height="100" style="height: 100px;">
</p>

<p align="center">
  <i>Data anonymization package, supporting different anonymization strategies</i>
</p>

<p align="center">
  <a href="https://github.com/eriknovak/anonipy/actions/workflows/unittests.yaml" target="_blank">
    <img src="https://github.com/eriknovak/anonipy/actions/workflows/unittests.yaml/badge.svg" alt="Test" />
  </a>
  <a href="https://pypi.org/project/anonipy" target="_blank">
    <img src="https://img.shields.io/pypi/v/anonipy?color=%2334D058&amp;label=pypi%20package" alt="Package version" />
  </a>
  <a href="https://pypi.org/project/anonipy" target="_blank">
    <img src="https://img.shields.io/pypi/pyversions/anonipy.svg?color=%2334D058" alt="Supported Python versions" />
  </a>
</p>


---

**Documentation:** [https://eriknovak.github.io/anonipy](https://eriknovak.github.io/anonipy)

**Source code:** [https://github.com/eriknovak/anonipy](https://github.com/eriknovak/anonipy)

---

The anonipy package is a python package for data anonymization. It is designed to be simple to use and highly customizable, supporting different anonymization strategies. Powered by LLMs.

## Requirements
Before starting the project make sure these requirements are available:

- [python]. The python programming language (v3.8, v3.9, v3.10).

## Install

```bash
pip install anonipy
```

## Upgrade

```bash
pip install anonipy --upgrade
```

## Example

```python
original_text = """\
Medical Record

Patient Name: John Doe
Date of Birth: 15-01-1985
Date of Examination: 20-05-2024
Social Security Number: 123-45-6789

Examination Procedure:
John Doe underwent a routine physical examination. The procedure included measuring vital signs (blood pressure, heart rate, temperature), a comprehensive blood panel, and a cardiovascular stress test. The patient also reported occasional headaches and dizziness, prompting a neurological assessment and an MRI scan to rule out any underlying issues.

Medication Prescribed:

Ibuprofen 200 mg: Take one tablet every 6-8 hours as needed for headache and pain relief.
Lisinopril 10 mg: Take one tablet daily to manage high blood pressure.
Next Examination Date:
15-11-2024
"""
```

Use the language detector to detect the language of the text:

```python
from anonipy.utils.language_detector import LanguageDetector

language_detector = LanguageDetector()
language = language_detector(original_text)
```

Prepare the entity extractor and extract the personal infomation from the original text:

```python
from anonipy.anonymize.extractors import NERExtractor

# define the labels to be extracted and anonymized
labels = [
    {"label": "name", "type": "string"},
    {"label": "social security number", "type": "custom"},
    {"label": "date of birth", "type": "date"},
    {"label": "date", "type": "date"},
]

# initialize the NER extractor for the language and labels
extractor = NERExtractor(labels, lang=language, score_th=0.5)

# extract the entities from the original text
doc, entities = extractor(original_text)

# display the entities in the original text
extractor.display(doc)
```

Use generators to create substitutes for the entities:

```python
from anonipy.anonymize.generators import (
    LLMLabelGenerator,
    DateGenerator,
    NumberGenerator,
)

# initialize the generators
llm_generator = LLMLabelGenerator()
date_generator = DateGenerator()
number_generator = NumberGenerator()

# prepare the anonymization mapping
def anonymization_mapping(text, entity):
    if entity.type == "string":
        return llm_generator.generate(entity, temperature=0.7)
    if entity.label == "date":
        return date_generator.generate(entity, output_gen="MIDDLE_OF_THE_MONTH")
    if entity.label == "date of birth":
        return date_generator.generate(entity, output_gen="MIDDLE_OF_THE_YEAR")
    if entity.label == "social security number":
        return number_generator.generate(entity)
    return "[REDACTED]"
```

Anonymize the text using the anonymization mapping:

```python
from anonipy.anonymize.strategies import PseudonymizationStrategy

# initialize the pseudonymization strategy
pseudo_strategy = PseudonymizationStrategy(mapping=anonymization_mapping)

# anonymize the original text
anonymized_text, replacements = pseudo_strategy.anonymize(original_text, entities)
```

## Acknowledgements

[Anonipy](https://eriknovak.github.io/anonipy/) is developed by the
[Department for Artificial Intelligence](http://ailab.ijs.si/) at the
[Jozef Stefan Institute](http://www.ijs.si/), and other contributors.

The project has received funding from the European Union's Horizon Europe research
and innovation programme under Grant Agreement No 101080288 ([PREPARE](https://prepare-rehab.eu/)).

<figure >
  <img src="https://github.com/eriknovak/anonipy/blob/main/docs/assets/imgs/EU.png?raw=true" alt=European Union flag" width="80" />
</figure>

[python]: https://www.python.org/

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "anonipy",
    "maintainer": "Erik Novak",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "python, machine learning, natural language processing, anonymization",
    "author": "Erik Novak",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/eb/39/3056a632a59d371b63a189d2dec9cf095a71d8958a28ce98ae027e296ad7/anonipy-0.1.2.tar.gz",
    "platform": null,
    "description": "\n<p align=\"center\">\n  <img src=\"https://raw.githubusercontent.com/eriknovak/anonipy/main/docs/assets/imgs/logo.png\" alt=\"logo\" height=\"100\" style=\"height: 100px;\">\n</p>\n\n<p align=\"center\">\n  <i>Data anonymization package, supporting different anonymization strategies</i>\n</p>\n\n<p align=\"center\">\n  <a href=\"https://github.com/eriknovak/anonipy/actions/workflows/unittests.yaml\" target=\"_blank\">\n    <img src=\"https://github.com/eriknovak/anonipy/actions/workflows/unittests.yaml/badge.svg\" alt=\"Test\" />\n  </a>\n  <a href=\"https://pypi.org/project/anonipy\" target=\"_blank\">\n    <img src=\"https://img.shields.io/pypi/v/anonipy?color=%2334D058&amp;label=pypi%20package\" alt=\"Package version\" />\n  </a>\n  <a href=\"https://pypi.org/project/anonipy\" target=\"_blank\">\n    <img src=\"https://img.shields.io/pypi/pyversions/anonipy.svg?color=%2334D058\" alt=\"Supported Python versions\" />\n  </a>\n</p>\n\n\n---\n\n**Documentation:** [https://eriknovak.github.io/anonipy](https://eriknovak.github.io/anonipy)\n\n**Source code:** [https://github.com/eriknovak/anonipy](https://github.com/eriknovak/anonipy)\n\n---\n\nThe anonipy package is a python package for data anonymization. It is designed to be simple to use and highly customizable, supporting different anonymization strategies. Powered by LLMs.\n\n## Requirements\nBefore starting the project make sure these requirements are available:\n\n- [python]. The python programming language (v3.8, v3.9, v3.10).\n\n## Install\n\n```bash\npip install anonipy\n```\n\n## Upgrade\n\n```bash\npip install anonipy --upgrade\n```\n\n## Example\n\n```python\noriginal_text = \"\"\"\\\nMedical Record\n\nPatient Name: John Doe\nDate of Birth: 15-01-1985\nDate of Examination: 20-05-2024\nSocial Security Number: 123-45-6789\n\nExamination Procedure:\nJohn Doe underwent a routine physical examination. The procedure included measuring vital signs (blood pressure, heart rate, temperature), a comprehensive blood panel, and a cardiovascular stress test. The patient also reported occasional headaches and dizziness, prompting a neurological assessment and an MRI scan to rule out any underlying issues.\n\nMedication Prescribed:\n\nIbuprofen 200 mg: Take one tablet every 6-8 hours as needed for headache and pain relief.\nLisinopril 10 mg: Take one tablet daily to manage high blood pressure.\nNext Examination Date:\n15-11-2024\n\"\"\"\n```\n\nUse the language detector to detect the language of the text:\n\n```python\nfrom anonipy.utils.language_detector import LanguageDetector\n\nlanguage_detector = LanguageDetector()\nlanguage = language_detector(original_text)\n```\n\nPrepare the entity extractor and extract the personal infomation from the original text:\n\n```python\nfrom anonipy.anonymize.extractors import NERExtractor\n\n# define the labels to be extracted and anonymized\nlabels = [\n    {\"label\": \"name\", \"type\": \"string\"},\n    {\"label\": \"social security number\", \"type\": \"custom\"},\n    {\"label\": \"date of birth\", \"type\": \"date\"},\n    {\"label\": \"date\", \"type\": \"date\"},\n]\n\n# initialize the NER extractor for the language and labels\nextractor = NERExtractor(labels, lang=language, score_th=0.5)\n\n# extract the entities from the original text\ndoc, entities = extractor(original_text)\n\n# display the entities in the original text\nextractor.display(doc)\n```\n\nUse generators to create substitutes for the entities:\n\n```python\nfrom anonipy.anonymize.generators import (\n    LLMLabelGenerator,\n    DateGenerator,\n    NumberGenerator,\n)\n\n# initialize the generators\nllm_generator = LLMLabelGenerator()\ndate_generator = DateGenerator()\nnumber_generator = NumberGenerator()\n\n# prepare the anonymization mapping\ndef anonymization_mapping(text, entity):\n    if entity.type == \"string\":\n        return llm_generator.generate(entity, temperature=0.7)\n    if entity.label == \"date\":\n        return date_generator.generate(entity, output_gen=\"MIDDLE_OF_THE_MONTH\")\n    if entity.label == \"date of birth\":\n        return date_generator.generate(entity, output_gen=\"MIDDLE_OF_THE_YEAR\")\n    if entity.label == \"social security number\":\n        return number_generator.generate(entity)\n    return \"[REDACTED]\"\n```\n\nAnonymize the text using the anonymization mapping:\n\n```python\nfrom anonipy.anonymize.strategies import PseudonymizationStrategy\n\n# initialize the pseudonymization strategy\npseudo_strategy = PseudonymizationStrategy(mapping=anonymization_mapping)\n\n# anonymize the original text\nanonymized_text, replacements = pseudo_strategy.anonymize(original_text, entities)\n```\n\n## Acknowledgements\n\n[Anonipy](https://eriknovak.github.io/anonipy/) is developed by the\n[Department for Artificial Intelligence](http://ailab.ijs.si/) at the\n[Jozef Stefan Institute](http://www.ijs.si/), and other contributors.\n\nThe project has received funding from the European Union's Horizon Europe research\nand innovation programme under Grant Agreement No 101080288 ([PREPARE](https://prepare-rehab.eu/)).\n\n<figure >\n  <img src=\"https://github.com/eriknovak/anonipy/blob/main/docs/assets/imgs/EU.png?raw=true\" alt=European Union flag\" width=\"80\" />\n</figure>\n\n[python]: https://www.python.org/\n",
    "bugtrack_url": null,
    "license": "BSD 2-Clause License  Copyright (c) 2024, Erik Novak All rights reserved.  Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.",
    "summary": "The data anonymization package",
    "version": "0.1.2",
    "project_urls": {
        "Docs": "https://eriknovak.github.io/anonipy",
        "Source": "https://github.com/eriknovak/anonipy"
    },
    "split_keywords": [
        "python",
        " machine learning",
        " natural language processing",
        " anonymization"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7173d27956232e2b84d036570046859cb468dc0a35abe4ebad3d681a90480308",
                "md5": "f3a94f15fedeeb2089dcc0c9349d6b85",
                "sha256": "a60e6ae257bdafcaf513920b75457a60f1bcb03283335c444855b41797405c82"
            },
            "downloads": -1,
            "filename": "anonipy-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f3a94f15fedeeb2089dcc0c9349d6b85",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 35222,
            "upload_time": "2024-07-23T12:13:56",
            "upload_time_iso_8601": "2024-07-23T12:13:56.576901Z",
            "url": "https://files.pythonhosted.org/packages/71/73/d27956232e2b84d036570046859cb468dc0a35abe4ebad3d681a90480308/anonipy-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "eb393056a632a59d371b63a189d2dec9cf095a71d8958a28ce98ae027e296ad7",
                "md5": "89cc655efb2475980e6d41a3439170e7",
                "sha256": "66753e4d1bebd313e02d06b6a0aa01b7ebb072913d9beb2e1fdae597c083a4a2"
            },
            "downloads": -1,
            "filename": "anonipy-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "89cc655efb2475980e6d41a3439170e7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 31644,
            "upload_time": "2024-07-23T12:13:58",
            "upload_time_iso_8601": "2024-07-23T12:13:58.004695Z",
            "url": "https://files.pythonhosted.org/packages/eb/39/3056a632a59d371b63a189d2dec9cf095a71d8958a28ce98ae027e296ad7/anonipy-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-23 12:13:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "eriknovak",
    "github_project": "anonipy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "spacy",
            "specs": []
        },
        {
            "name": "gliner",
            "specs": [
                [
                    "==",
                    "0.2.2"
                ]
            ]
        },
        {
            "name": "gliner-spacy",
            "specs": [
                [
                    "==",
                    "0.0.8"
                ]
            ]
        },
        {
            "name": "transformers",
            "specs": []
        },
        {
            "name": "bitsandbytes",
            "specs": []
        },
        {
            "name": "lingua-language-detector",
            "specs": []
        },
        {
            "name": "guidance",
            "specs": [
                [
                    "==",
                    "0.1.14"
                ]
            ]
        },
        {
            "name": "python-dateutil",
            "specs": [
                [
                    ">=",
                    "2.9.0"
                ]
            ]
        },
        {
            "name": "sentencepiece",
            "specs": []
        },
        {
            "name": "pypdf",
            "specs": [
                [
                    ">=",
                    "4.2.0"
                ]
            ]
        },
        {
            "name": "python-docx",
            "specs": [
                [
                    ">=",
                    "1.1.2"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": []
        }
    ],
    "lcname": "anonipy"
}
        
Elapsed time: 0.54376s