anjana


Nameanjana JSON
Version 0.0.1.post1 PyPI version JSON
download
home_pagehttps://gitlab.ifca.es/privacy-security/anjana
SummaryANJANA is an open source framework for applying different anonymity techniques.
upload_time2024-04-11 11:30:57
maintainerJudith Sáinz-Pardo Díaz
docs_urlNone
authorJudith Sáinz-Pardo Díaz
requires_python<4.0,>=3.9
licenseApache-2.0
keywords anonymity privacy
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ANJANA
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-green.svg)](https://gitlab.ifca.es/privacy-security/anjana/-/blob/main/LICENSE)
[![Pipeline Status](https://gitlab.ifca.es/privacy-security/anjana/badges/main/pipeline.svg)](https://gitlab.ifca.es/privacy-security/anjana/-/pipelines)

![Python version](https://img.shields.io/badge/python-3.9|3.10|3.11|3.12-blue)


**Anonymity as major assurance of personal data privacy**

ANJANA is a Python library for anonymizing sensitive data.

The following anonymity techniques are implemented, based on the Python library _[pyCANON](https://github.com/IFCA-Advanced-Computing/pycanon)_:
* _k-anonymity_.
* _(α,k)-anonymity_.
* _ℓ-diversity_.
* _Entropy ℓ-diversity_.
* _Recursive (c,ℓ)-diversity_.
* _t-closeness_.
* _Basic β-likeness_.
* _Enhanced β-likeness_.
* _δ-disclosure privacy_.

## Getting started

For anonymizing your data you need to introduce:
* The **pandas dataframe** with the data to be anonymized. Each column can contain: indentifiers, quasi-indentifiers or sensitive attributes.
* The **list with the names of the identifiers** in the dataframe, in order to suppress them.
* The **list with the names of the quasi-identifiers** in the dataframe.
* The **sentive attribute** (only one) in case of applying other techniques than _k-anonymity_.
* The **level of anonymity to be applied**, e.g. _k_ (for _k-anonymity_), _ℓ_ (for _ℓ-diversity_), _t_ (for _t-closeness_), _β_ (for _basic or enhanced β-likeness_), etc.
* Maximum **level of record suppression** allowed (from 0 to 100).
* Dictionary containing one dictionary for each quasi-identifier with the **hierarchies** and the levels.

### Example: apply _k-anonymity_, _ℓ-diversity_ and _t-closeness_ to the [adult dataset](https://archive.ics.uci.edu/dataset/2/adult) with some predefined hierarchies:
```python
import pandas as pd
import anjana
from anjana.anonymity import k_anonymity, l_diversity, t_closeness

# Read and process the data
data = pd.read_csv("adult.csv") 
data.columns = data.columns.str.strip()
cols = [
    "workclass",
    "education",
    "marital-status",
    "occupation",
    "sex",
    "native-country",
]
for col in cols:
    data[col] = data[col].str.strip()

# Define the identifiers, quasi-identifiers and the sensitive attribute
quasi_ident = [
    "age",
    "education",
    "marital-status",
    "occupation",
    "sex",
    "native-country",
]
ident = ["race"]
sens_att = "salary-class"

# Select the desired level of k, l and t
k = 10
l_div = 2
t = 0.5

# Select the suppression limit allowed
supp_level = 50

# Import the hierarquies for each quasi-identifier. Define a dictionary containing them
hierarchies = {
    "age": dict(pd.read_csv("hierarchies/age.csv", header=None)),
    "education": dict(pd.read_csv("hierarchies/education.csv", header=None)),
    "marital-status": dict(pd.read_csv("hierarchies/marital.csv", header=None)),
    "occupation": dict(pd.read_csv("hierarchies/occupation.csv", header=None)),
    "sex": dict(pd.read_csv("hierarchies/sex.csv", header=None)),
    "native-country": dict(pd.read_csv("hierarchies/country.csv", header=None)),
}

# Apply the three functions: k-anonymity, l-diversity and t-closeness
data_anon = k_anonymity(data, ident, quasi_ident, k, supp_level, hierarchies)
data_anon = l_diversity(
    data_anon, ident, quasi_ident, sens_att, k, l_div, supp_level, hierarchies
)
data_anon = t_closeness(
    data_anon, ident, quasi_ident, sens_att, k, t, supp_level, hierarchies
)
```

The previous code can be executed in less than 4 seconds for the more than 30,000 records of the original dataset.

## License
This project is licensed under the [Apache 2.0 license](https://gitlab.ifca.es/privacy-security/anjana/-/blob/main/LICENSE?ref_type=heads).

## Project status
This project is under active development.

## Funding and acknowledgments
This work is funded by European Union through the SIESTA project (Horizon Europe) under Grant number 101131957.
<p>
<img align="center" width="250" src="https://ec.europa.eu/regional_policy/images/information-sources/logo-download-center/eu_funded_en.jpg">
<img align="center" width="250" src="https://eosc.eu/wp-content/uploads/2024/01/SIESTA-Logo-1.png">
<p>


----
**_Note: Anjana and the mythology of Cantabria_**
<p align="center">
    <i>
"La Anjana" is a character from the mythology of Cantabria. Known as the good fairy of Cantabria, generous and protective of all people, she helps the poor, the suffering and those who stray in the forest. 
    </i>
</p>
<p align="center">
    <i>
- Partially extracted from: Cotera, Gustavo. Mitología de Cantabria. Ed. Tantin, Santander, 1998.
    </i>
    </p>
</div>



            

Raw data

            {
    "_id": null,
    "home_page": "https://gitlab.ifca.es/privacy-security/anjana",
    "name": "anjana",
    "maintainer": "Judith S\u00e1inz-Pardo D\u00edaz",
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": "sainzpardo@ifca.unican.es",
    "keywords": "anonymity, privacy",
    "author": "Judith S\u00e1inz-Pardo D\u00edaz",
    "author_email": "sainzpardo@ifca.unican.es",
    "download_url": "https://files.pythonhosted.org/packages/7c/b1/aae18d710bc018d0238c2a3909f5b97d1b72d1034a247c6eb298882a76b8/anjana-0.0.1.post1.tar.gz",
    "platform": null,
    "description": "# ANJANA\n[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-green.svg)](https://gitlab.ifca.es/privacy-security/anjana/-/blob/main/LICENSE)\n[![Pipeline Status](https://gitlab.ifca.es/privacy-security/anjana/badges/main/pipeline.svg)](https://gitlab.ifca.es/privacy-security/anjana/-/pipelines)\n\n![Python version](https://img.shields.io/badge/python-3.9|3.10|3.11|3.12-blue)\n\n\n**Anonymity as major assurance of personal data privacy**\n\nANJANA is a Python library for anonymizing sensitive data.\n\nThe following anonymity techniques are implemented, based on the Python library _[pyCANON](https://github.com/IFCA-Advanced-Computing/pycanon)_:\n* _k-anonymity_.\n* _(\u03b1,k)-anonymity_.\n* _\u2113-diversity_.\n* _Entropy \u2113-diversity_.\n* _Recursive (c,\u2113)-diversity_.\n* _t-closeness_.\n* _Basic \u03b2-likeness_.\n* _Enhanced \u03b2-likeness_.\n* _\u03b4-disclosure privacy_.\n\n## Getting started\n\nFor anonymizing your data you need to introduce:\n* The **pandas dataframe** with the data to be anonymized. Each column can contain: indentifiers, quasi-indentifiers or sensitive attributes.\n* The **list with the names of the identifiers** in the dataframe, in order to suppress them.\n* The **list with the names of the quasi-identifiers** in the dataframe.\n* The **sentive attribute** (only one) in case of applying other techniques than _k-anonymity_.\n* The **level of anonymity to be applied**, e.g. _k_ (for _k-anonymity_), _\u2113_ (for _\u2113-diversity_), _t_ (for _t-closeness_), _\u03b2_ (for _basic or enhanced \u03b2-likeness_), etc.\n* Maximum **level of record suppression** allowed (from 0 to 100).\n* Dictionary containing one dictionary for each quasi-identifier with the **hierarchies** and the levels.\n\n### Example: apply _k-anonymity_, _\u2113-diversity_ and _t-closeness_ to the [adult dataset](https://archive.ics.uci.edu/dataset/2/adult) with some predefined hierarchies:\n```python\nimport pandas as pd\nimport anjana\nfrom anjana.anonymity import k_anonymity, l_diversity, t_closeness\n\n# Read and process the data\ndata = pd.read_csv(\"adult.csv\") \ndata.columns = data.columns.str.strip()\ncols = [\n    \"workclass\",\n    \"education\",\n    \"marital-status\",\n    \"occupation\",\n    \"sex\",\n    \"native-country\",\n]\nfor col in cols:\n    data[col] = data[col].str.strip()\n\n# Define the identifiers, quasi-identifiers and the sensitive attribute\nquasi_ident = [\n    \"age\",\n    \"education\",\n    \"marital-status\",\n    \"occupation\",\n    \"sex\",\n    \"native-country\",\n]\nident = [\"race\"]\nsens_att = \"salary-class\"\n\n# Select the desired level of k, l and t\nk = 10\nl_div = 2\nt = 0.5\n\n# Select the suppression limit allowed\nsupp_level = 50\n\n# Import the hierarquies for each quasi-identifier. Define a dictionary containing them\nhierarchies = {\n    \"age\": dict(pd.read_csv(\"hierarchies/age.csv\", header=None)),\n    \"education\": dict(pd.read_csv(\"hierarchies/education.csv\", header=None)),\n    \"marital-status\": dict(pd.read_csv(\"hierarchies/marital.csv\", header=None)),\n    \"occupation\": dict(pd.read_csv(\"hierarchies/occupation.csv\", header=None)),\n    \"sex\": dict(pd.read_csv(\"hierarchies/sex.csv\", header=None)),\n    \"native-country\": dict(pd.read_csv(\"hierarchies/country.csv\", header=None)),\n}\n\n# Apply the three functions: k-anonymity, l-diversity and t-closeness\ndata_anon = k_anonymity(data, ident, quasi_ident, k, supp_level, hierarchies)\ndata_anon = l_diversity(\n    data_anon, ident, quasi_ident, sens_att, k, l_div, supp_level, hierarchies\n)\ndata_anon = t_closeness(\n    data_anon, ident, quasi_ident, sens_att, k, t, supp_level, hierarchies\n)\n```\n\nThe previous code can be executed in less than 4 seconds for the more than 30,000 records of the original dataset.\n\n## License\nThis project is licensed under the [Apache 2.0 license](https://gitlab.ifca.es/privacy-security/anjana/-/blob/main/LICENSE?ref_type=heads).\n\n## Project status\nThis project is under active development.\n\n## Funding and acknowledgments\nThis work is funded by European Union through the SIESTA project (Horizon Europe) under Grant number 101131957.\n<p>\n<img align=\"center\" width=\"250\" src=\"https://ec.europa.eu/regional_policy/images/information-sources/logo-download-center/eu_funded_en.jpg\">\n<img align=\"center\" width=\"250\" src=\"https://eosc.eu/wp-content/uploads/2024/01/SIESTA-Logo-1.png\">\n<p>\n\n\n----\n**_Note: Anjana and the mythology of Cantabria_**\n<p align=\"center\">\n    <i>\n\"La Anjana\" is a character from the mythology of Cantabria. Known as the good fairy of Cantabria, generous and protective of all people, she helps the poor, the suffering and those who stray in the forest. \n    </i>\n</p>\n<p align=\"center\">\n    <i>\n- Partially extracted from: Cotera, Gustavo. Mitolog\u00eda de Cantabria. Ed. Tantin, Santander, 1998.\n    </i>\n    </p>\n</div>\n\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "ANJANA is an open source framework for applying different anonymity techniques.",
    "version": "0.0.1.post1",
    "project_urls": {
        "Homepage": "https://gitlab.ifca.es/privacy-security/anjana",
        "Repository": "https://gitlab.ifca.es/privacy-security/anjana"
    },
    "split_keywords": [
        "anonymity",
        " privacy"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dc7aaea4c97ab5e7a6d482a7e170147fb61c42e5c5469d905e01227eb8d79176",
                "md5": "e9d2fef52487cd5745f87b7bce86c3ea",
                "sha256": "5e9604b8112907a4b8001829b60558bbd4c5696a74f793dd74820feb33b5ecfa"
            },
            "downloads": -1,
            "filename": "anjana-0.0.1.post1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e9d2fef52487cd5745f87b7bce86c3ea",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 20272,
            "upload_time": "2024-04-11T11:30:56",
            "upload_time_iso_8601": "2024-04-11T11:30:56.276534Z",
            "url": "https://files.pythonhosted.org/packages/dc/7a/aea4c97ab5e7a6d482a7e170147fb61c42e5c5469d905e01227eb8d79176/anjana-0.0.1.post1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7cb1aae18d710bc018d0238c2a3909f5b97d1b72d1034a247c6eb298882a76b8",
                "md5": "a00eb45ac7095f246f96808569e788a3",
                "sha256": "6f00e34aacb30d0138f4f4eadd0bcf4d83316014f97d3efe16e0a48d348dbb2d"
            },
            "downloads": -1,
            "filename": "anjana-0.0.1.post1.tar.gz",
            "has_sig": false,
            "md5_digest": "a00eb45ac7095f246f96808569e788a3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 13089,
            "upload_time": "2024-04-11T11:30:57",
            "upload_time_iso_8601": "2024-04-11T11:30:57.995016Z",
            "url": "https://files.pythonhosted.org/packages/7c/b1/aae18d710bc018d0238c2a3909f5b97d1b72d1034a247c6eb298882a76b8/anjana-0.0.1.post1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-11 11:30:57",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "anjana"
}
        
Elapsed time: 0.27607s