pyCANON
=======
|License| |Documentation Status| |Pipeline Status|
pyCANON is a Python library and CLI to assess the values of the parameters
associated with the most common privacy-preserving techniques via anonymization.
**Authors:** Judith Sáinz-Pardo Díaz and Álvaro López García (IFCA - CSIC).
Installation
------------
We recommend to use Python3 with
`virtualenv <https://virtualenv.pypa.io/en/latest/>`__:
::
virtualenv .venv -p python3
source .venv/bin/activate
Then run the following command to install the library and all its
requirements:
::
pip install pycanon
If you also want to install the functionality that allows to generate PDF files
for the reports, install as follows
::
pip install pycanon[PDF]
Documentation
-------------
The pyCANON documentation is hosted on `Read the
Docs <https://pycanon.readthedocs.io/>`__.
Getting started
---------------
Example using the `adult
dataset <https://archive.ics.uci.edu/ml/datasets/adult>`__:
.. code:: python
import pandas as pd
from pycanon import anonymity, report
FILE_NAME = "adult.csv"
QI = ["age", "education", "occupation", "relationship", "sex", "native-country"]
SA = ["salary-class"]
DATA = pd.read_csv(FILE_NAME)
# Calculate k for k-anonymity:
k = anonymity.k_anonymity(DATA, QI)
# Print the anonymity report:
report.print_report(DATA, QI, SA)
Description
-----------
pyCANON allows to check if the following privacy-preserving techniques
are verified and the value of the parameters associated with each of
them.
+---------------------------+-----------------------------+------------+-----------------------------------------------------+
| Technique | pyCANON function | Parameters | Notes |
+===========================+=============================+============+=====================================================+
| k-anonymity | ``k_anonymity`` | *k*: int | |
+---------------------------+-----------------------------+------------+-----------------------------------------------------+
| (α, k)-anonymity | ``alpha_k_anonymity`` | *α*: float | |
| | | *k*:int | |
+---------------------------+-----------------------------+------------+-----------------------------------------------------+
| ℓ-diversity | ``l_diversity`` | *ℓ*: int | |
+---------------------------+-----------------------------+------------+-----------------------------------------------------+
| Entropy ℓ-diversity | ``entropy_l_diversity`` | *ℓ*: int | |
+---------------------------+-----------------------------+------------+-----------------------------------------------------+
| Recursive (c,ℓ)-diversity | ``recursive_c_l_diversity`` | *c*: int | Not calculated if ℓ=1 |
| | | *ℓ*: int | |
+---------------------------+-----------------------------+------------+-----------------------------------------------------+
| Basic β-likeness | ``basic_beta_likeness`` | *β*: float | |
+---------------------------+-----------------------------+------------+-----------------------------------------------------+
| Enhanced β-likeness | ``enhanced_beta_likeness`` | *β*: float | |
+---------------------------+-----------------------------+------------+-----------------------------------------------------+
| t-closeness | ``t_closeness`` | *t*: float | For numerical attributes the definition of the EMD |
| | | | (one-dimensional Earth Mover’s Distance) is used. |
| | | | For categorical attributes, the metric "Equal |
| | | | Distance" is used. |
+---------------------------+-----------------------------+------------+-----------------------------------------------------+
| δ-disclosure privacy | ``delta_disclosure`` | *δ*: float | |
+---------------------------+-----------------------------+------------+-----------------------------------------------------+
More information can be found in this `paper <https://www.nature.com/articles/s41597-022-01894-2>`__.
In addition, a report can be obtained including information on the equivalence claases and the
usefulness of the data. In particular, for the latter the following three classically used metrics
are implemented (as defined in the `documentation <https://pycanon.readthedocs.io/>`__):
*average equivalence class size*, *classification metric* and *discernability metric*.
Citation
-----------
If you are using pyCANON you can cite it as follows::
@article{sainzpardo2022pycanon,
title={A Python library to check the level of anonymity of a dataset},
author={S{\'a}inz-Pardo D{\'\i}az, Judith and L{\'o}pez Garc{\'\i}a, {\'A}lvaro},
journal={Scientific Data},
volume={9},
number={1},
pages={785},
year={2022},
publisher={Nature Publishing Group UK London}}
Acknowledgments
-----------------
The authors would like to thank the funding through the European Union - NextGenerationEU
(Regulation EU 2020/2094), through CSIC’s Global Health Platform (PTI+ Salud Global) and
the support from the project AI4EOSC “Artificial Intelligence for the European Open Science
Cloud” that has received funding from the European Union’s Horizon Europe research and
innovation programme under grant agreement number 101058593.
.. |License| image:: https://img.shields.io/badge/License-Apache_2.0-blue.svg
:target: https://gitlab.ifca.es/sainzj/check-anonymity/-/blob/main/LICENSE
.. |Documentation Status| image:: https://readthedocs.org/projects/pycanon/badge/?version=latest
:target: https://pycanon.readthedocs.io/en/latest/?badge=latest
.. |Pipeline Status| image:: https://gitlab.ifca.es/privacy-security/pycanon/badges/main/pipeline.svg
:target: https://gitlab.ifca.es/privacy-security/pycanon/-/pipelines
Raw data
{
"_id": null,
"home_page": "https://gitlab.ifca.es/privacy-security/pycanon",
"name": "pycanon",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "data,privacy,anonymity",
"author": "Judith S\u00e1inz-Pardo D\u00edaz, \u00c1lvaro L\u00f3pez Garc\u00eda (IFCA (CSIC-UC))",
"author_email": "sainzpardo@ifca.unican.es, aloga@ifca.unican.es",
"download_url": "https://files.pythonhosted.org/packages/15/87/0aa3acadb1b3ec3f607748b388e7fcb74c0fc58e9417198f6da4c2d595cf/pycanon-1.0.1.post2.tar.gz",
"platform": null,
"description": "pyCANON\n=======\n\n|License| |Documentation Status| |Pipeline Status|\n\npyCANON is a Python library and CLI to assess the values of the parameters\nassociated with the most common privacy-preserving techniques via anonymization.\n\n**Authors:** Judith S\u00e1inz-Pardo D\u00edaz and \u00c1lvaro L\u00f3pez Garc\u00eda (IFCA - CSIC).\n\nInstallation\n------------\n\nWe recommend to use Python3 with\n`virtualenv <https://virtualenv.pypa.io/en/latest/>`__:\n\n::\n\n virtualenv .venv -p python3\n source .venv/bin/activate\n\nThen run the following command to install the library and all its\nrequirements:\n\n::\n\n pip install pycanon\n\n\nIf you also want to install the functionality that allows to generate PDF files\nfor the reports, install as follows\n::\n\n pip install pycanon[PDF]\n\n\nDocumentation\n-------------\n\nThe pyCANON documentation is hosted on `Read the\nDocs <https://pycanon.readthedocs.io/>`__.\n\nGetting started\n---------------\n\nExample using the `adult\ndataset <https://archive.ics.uci.edu/ml/datasets/adult>`__:\n\n.. code:: python\n\n import pandas as pd\n from pycanon import anonymity, report\n\n FILE_NAME = \"adult.csv\"\n QI = [\"age\", \"education\", \"occupation\", \"relationship\", \"sex\", \"native-country\"]\n SA = [\"salary-class\"]\n DATA = pd.read_csv(FILE_NAME)\n\n # Calculate k for k-anonymity:\n k = anonymity.k_anonymity(DATA, QI)\n\n # Print the anonymity report:\n report.print_report(DATA, QI, SA)\n\nDescription\n-----------\n\npyCANON allows to check if the following privacy-preserving techniques\nare verified and the value of the parameters associated with each of\nthem.\n\n+---------------------------+-----------------------------+------------+-----------------------------------------------------+\n| Technique | pyCANON function | Parameters | Notes |\n+===========================+=============================+============+=====================================================+\n| k-anonymity | ``k_anonymity`` | *k*: int | |\n+---------------------------+-----------------------------+------------+-----------------------------------------------------+\n| (\u03b1, k)-anonymity | ``alpha_k_anonymity`` | *\u03b1*: float | |\n| | | *k*:int | |\n+---------------------------+-----------------------------+------------+-----------------------------------------------------+\n| \u2113-diversity | ``l_diversity`` | *\u2113*: int | |\n+---------------------------+-----------------------------+------------+-----------------------------------------------------+\n| Entropy \u2113-diversity | ``entropy_l_diversity`` | *\u2113*: int | |\n+---------------------------+-----------------------------+------------+-----------------------------------------------------+\n| Recursive (c,\u2113)-diversity | ``recursive_c_l_diversity`` | *c*: int | Not calculated if \u2113=1 |\n| | | *\u2113*: int | |\n+---------------------------+-----------------------------+------------+-----------------------------------------------------+\n| Basic \u03b2-likeness | ``basic_beta_likeness`` | *\u03b2*: float | |\n+---------------------------+-----------------------------+------------+-----------------------------------------------------+\n| Enhanced \u03b2-likeness | ``enhanced_beta_likeness`` | *\u03b2*: float | |\n+---------------------------+-----------------------------+------------+-----------------------------------------------------+\n| t-closeness | ``t_closeness`` | *t*: float | For numerical attributes the definition of the EMD |\n| | | | (one-dimensional Earth Mover\u2019s Distance) is used. |\n| | | | For categorical attributes, the metric \"Equal |\n| | | | Distance\" is used. |\n+---------------------------+-----------------------------+------------+-----------------------------------------------------+\n| \u03b4-disclosure privacy | ``delta_disclosure`` | *\u03b4*: float | |\n+---------------------------+-----------------------------+------------+-----------------------------------------------------+\n\nMore information can be found in this `paper <https://www.nature.com/articles/s41597-022-01894-2>`__.\n\nIn addition, a report can be obtained including information on the equivalence claases and the \nusefulness of the data. In particular, for the latter the following three classically used metrics\nare implemented (as defined in the `documentation <https://pycanon.readthedocs.io/>`__): \n*average equivalence class size*, *classification metric* and *discernability metric*.\n\nCitation\n-----------\nIf you are using pyCANON you can cite it as follows:: \n\n @article{sainzpardo2022pycanon,\n title={A Python library to check the level of anonymity of a dataset},\n author={S{\\'a}inz-Pardo D{\\'\\i}az, Judith and L{\\'o}pez Garc{\\'\\i}a, {\\'A}lvaro},\n journal={Scientific Data},\n volume={9},\n number={1},\n pages={785},\n year={2022},\n publisher={Nature Publishing Group UK London}}\n\n\nAcknowledgments\n-----------------\n\nThe authors would like to thank the funding through the European Union - NextGenerationEU \n(Regulation EU 2020/2094), through CSIC\u2019s Global Health Platform (PTI+ Salud Global) and \nthe support from the project AI4EOSC \u201cArtificial Intelligence for the European Open Science \nCloud\u201d that has received funding from the European Union\u2019s Horizon Europe research and \ninnovation programme under grant agreement number 101058593.\n\n.. |License| image:: https://img.shields.io/badge/License-Apache_2.0-blue.svg\n :target: https://gitlab.ifca.es/sainzj/check-anonymity/-/blob/main/LICENSE\n.. |Documentation Status| image:: https://readthedocs.org/projects/pycanon/badge/?version=latest\n :target: https://pycanon.readthedocs.io/en/latest/?badge=latest\n.. |Pipeline Status| image:: https://gitlab.ifca.es/privacy-security/pycanon/badges/main/pipeline.svg\n :target: https://gitlab.ifca.es/privacy-security/pycanon/-/pipelines\n",
"bugtrack_url": null,
"license": "Apache License 2.0",
"summary": "pyCANON, A Python library to check the level of anonymity of a dataset",
"version": "1.0.1.post2",
"project_urls": {
"Homepage": "https://gitlab.ifca.es/privacy-security/pycanon"
},
"split_keywords": [
"data",
"privacy",
"anonymity"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "03a3de0975fd1be294b69f4b6ca6de5f576683c2670a846286d383e67184b407",
"md5": "2b8885cfb17a32cc3785993767a7971c",
"sha256": "b3ebd13c7f253ba78ec12074e1e6c546595af733c6cd0b0b39fb04102ddeb353"
},
"downloads": -1,
"filename": "pycanon-1.0.1.post2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2b8885cfb17a32cc3785993767a7971c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 30818,
"upload_time": "2024-03-07T15:18:28",
"upload_time_iso_8601": "2024-03-07T15:18:28.199683Z",
"url": "https://files.pythonhosted.org/packages/03/a3/de0975fd1be294b69f4b6ca6de5f576683c2670a846286d383e67184b407/pycanon-1.0.1.post2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "15870aa3acadb1b3ec3f607748b388e7fcb74c0fc58e9417198f6da4c2d595cf",
"md5": "4d563be56ef301770deb4c4990db0f98",
"sha256": "1fc2a50a156488d61a240e57312984cd1b6dfc5008d2671f1e72a4f30dca58f3"
},
"downloads": -1,
"filename": "pycanon-1.0.1.post2.tar.gz",
"has_sig": false,
"md5_digest": "4d563be56ef301770deb4c4990db0f98",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 19982,
"upload_time": "2024-03-07T15:18:29",
"upload_time_iso_8601": "2024-03-07T15:18:29.892260Z",
"url": "https://files.pythonhosted.org/packages/15/87/0aa3acadb1b3ec3f607748b388e7fcb74c0fc58e9417198f6da4c2d595cf/pycanon-1.0.1.post2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-07 15:18:29",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "pycanon"
}