pycanon


Namepycanon JSON
Version 1.0.1.post2 PyPI version JSON
download
home_pagehttps://gitlab.ifca.es/privacy-security/pycanon
SummarypyCANON, A Python library to check the level of anonymity of a dataset
upload_time2024-03-07 15:18:29
maintainer
docs_urlNone
authorJudith Sáinz-Pardo Díaz, Álvaro López García (IFCA (CSIC-UC))
requires_python
licenseApache License 2.0
keywords data privacy anonymity
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            pyCANON
=======

|License| |Documentation Status| |Pipeline Status|

pyCANON is a Python library and CLI to assess the values of the parameters
associated with the most common privacy-preserving techniques via anonymization.

**Authors:** Judith Sáinz-Pardo Díaz and Álvaro López García (IFCA - CSIC).

Installation
------------

We recommend to use Python3 with
`virtualenv <https://virtualenv.pypa.io/en/latest/>`__:

::

   virtualenv .venv -p python3
   source .venv/bin/activate

Then run the following command to install the library and all its
requirements:

::

   pip install pycanon


If you also want to install the functionality that allows to generate PDF files
for the reports, install as follows
::

   pip install pycanon[PDF]


Documentation
-------------

The pyCANON documentation is hosted on `Read the
Docs <https://pycanon.readthedocs.io/>`__.

Getting started
---------------

Example using the `adult
dataset <https://archive.ics.uci.edu/ml/datasets/adult>`__:

.. code:: python

   import pandas as pd
   from pycanon import anonymity, report

   FILE_NAME = "adult.csv"
   QI = ["age", "education", "occupation", "relationship", "sex", "native-country"]
   SA = ["salary-class"]
   DATA = pd.read_csv(FILE_NAME)

   # Calculate k for k-anonymity:
   k = anonymity.k_anonymity(DATA, QI)

   # Print the anonymity report:
   report.print_report(DATA, QI, SA)

Description
-----------

pyCANON allows to check if the following privacy-preserving techniques
are verified and the value of the parameters associated with each of
them.

+---------------------------+-----------------------------+------------+-----------------------------------------------------+
| Technique                 | pyCANON function            | Parameters | Notes                                               |
+===========================+=============================+============+=====================================================+
| k-anonymity               | ``k_anonymity``             | *k*: int   |                                                     |
+---------------------------+-----------------------------+------------+-----------------------------------------------------+
| (α, k)-anonymity          | ``alpha_k_anonymity``       | *α*: float |                                                     |
|                           |                             | *k*:int    |                                                     |
+---------------------------+-----------------------------+------------+-----------------------------------------------------+
| ℓ-diversity               | ``l_diversity``             | *ℓ*: int   |                                                     |
+---------------------------+-----------------------------+------------+-----------------------------------------------------+
| Entropy ℓ-diversity       | ``entropy_l_diversity``     | *ℓ*: int   |                                                     |
+---------------------------+-----------------------------+------------+-----------------------------------------------------+
| Recursive (c,ℓ)-diversity | ``recursive_c_l_diversity`` | *c*: int   | Not calculated if ℓ=1                               |
|                           |                             | *ℓ*: int   |                                                     |
+---------------------------+-----------------------------+------------+-----------------------------------------------------+
| Basic β-likeness          | ``basic_beta_likeness``     | *β*: float |                                                     |
+---------------------------+-----------------------------+------------+-----------------------------------------------------+
| Enhanced β-likeness       | ``enhanced_beta_likeness``  | *β*: float |                                                     |
+---------------------------+-----------------------------+------------+-----------------------------------------------------+
| t-closeness               | ``t_closeness``             | *t*: float | For numerical attributes the definition of the EMD  |
|                           |                             |            | (one-dimensional Earth Mover’s Distance) is used.   |
|                           |                             |            | For categorical attributes, the metric "Equal       |
|                           |                             |            | Distance" is used.                                  |
+---------------------------+-----------------------------+------------+-----------------------------------------------------+
| δ-disclosure privacy      | ``delta_disclosure``        | *δ*: float |                                                     |
+---------------------------+-----------------------------+------------+-----------------------------------------------------+

More information can be found in this `paper <https://www.nature.com/articles/s41597-022-01894-2>`__.

In addition, a report can be obtained including information on the equivalence claases and the 
usefulness of the data. In particular, for the latter the following three classically used metrics
are implemented (as defined in the `documentation <https://pycanon.readthedocs.io/>`__): 
*average equivalence class size*, *classification metric* and *discernability metric*.

Citation
-----------
If you are using pyCANON you can cite it as follows:: 

   @article{sainzpardo2022pycanon,
      title={A Python library to check the level of anonymity of a dataset},
      author={S{\'a}inz-Pardo D{\'\i}az, Judith and L{\'o}pez Garc{\'\i}a, {\'A}lvaro},
      journal={Scientific Data},
      volume={9},
      number={1},
      pages={785},
      year={2022},
      publisher={Nature Publishing Group UK London}}


Acknowledgments
-----------------

The authors would like to thank the funding through the European Union - NextGenerationEU 
(Regulation EU 2020/2094), through CSIC’s Global Health Platform (PTI+ Salud Global) and 
the support from the project AI4EOSC “Artificial Intelligence for the European Open Science 
Cloud” that has received funding from the European Union’s Horizon Europe research and 
innovation programme under grant agreement number 101058593.

.. |License| image:: https://img.shields.io/badge/License-Apache_2.0-blue.svg
   :target: https://gitlab.ifca.es/sainzj/check-anonymity/-/blob/main/LICENSE
.. |Documentation Status| image:: https://readthedocs.org/projects/pycanon/badge/?version=latest
   :target: https://pycanon.readthedocs.io/en/latest/?badge=latest
.. |Pipeline Status| image:: https://gitlab.ifca.es/privacy-security/pycanon/badges/main/pipeline.svg
   :target: https://gitlab.ifca.es/privacy-security/pycanon/-/pipelines

            

Raw data

            {
    "_id": null,
    "home_page": "https://gitlab.ifca.es/privacy-security/pycanon",
    "name": "pycanon",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "data,privacy,anonymity",
    "author": "Judith S\u00e1inz-Pardo D\u00edaz, \u00c1lvaro L\u00f3pez Garc\u00eda (IFCA (CSIC-UC))",
    "author_email": "sainzpardo@ifca.unican.es, aloga@ifca.unican.es",
    "download_url": "https://files.pythonhosted.org/packages/15/87/0aa3acadb1b3ec3f607748b388e7fcb74c0fc58e9417198f6da4c2d595cf/pycanon-1.0.1.post2.tar.gz",
    "platform": null,
    "description": "pyCANON\n=======\n\n|License| |Documentation Status| |Pipeline Status|\n\npyCANON is a Python library and CLI to assess the values of the parameters\nassociated with the most common privacy-preserving techniques via anonymization.\n\n**Authors:** Judith S\u00e1inz-Pardo D\u00edaz and \u00c1lvaro L\u00f3pez Garc\u00eda (IFCA - CSIC).\n\nInstallation\n------------\n\nWe recommend to use Python3 with\n`virtualenv <https://virtualenv.pypa.io/en/latest/>`__:\n\n::\n\n   virtualenv .venv -p python3\n   source .venv/bin/activate\n\nThen run the following command to install the library and all its\nrequirements:\n\n::\n\n   pip install pycanon\n\n\nIf you also want to install the functionality that allows to generate PDF files\nfor the reports, install as follows\n::\n\n   pip install pycanon[PDF]\n\n\nDocumentation\n-------------\n\nThe pyCANON documentation is hosted on `Read the\nDocs <https://pycanon.readthedocs.io/>`__.\n\nGetting started\n---------------\n\nExample using the `adult\ndataset <https://archive.ics.uci.edu/ml/datasets/adult>`__:\n\n.. code:: python\n\n   import pandas as pd\n   from pycanon import anonymity, report\n\n   FILE_NAME = \"adult.csv\"\n   QI = [\"age\", \"education\", \"occupation\", \"relationship\", \"sex\", \"native-country\"]\n   SA = [\"salary-class\"]\n   DATA = pd.read_csv(FILE_NAME)\n\n   # Calculate k for k-anonymity:\n   k = anonymity.k_anonymity(DATA, QI)\n\n   # Print the anonymity report:\n   report.print_report(DATA, QI, SA)\n\nDescription\n-----------\n\npyCANON allows to check if the following privacy-preserving techniques\nare verified and the value of the parameters associated with each of\nthem.\n\n+---------------------------+-----------------------------+------------+-----------------------------------------------------+\n| Technique                 | pyCANON function            | Parameters | Notes                                               |\n+===========================+=============================+============+=====================================================+\n| k-anonymity               | ``k_anonymity``             | *k*: int   |                                                     |\n+---------------------------+-----------------------------+------------+-----------------------------------------------------+\n| (\u03b1, k)-anonymity          | ``alpha_k_anonymity``       | *\u03b1*: float |                                                     |\n|                           |                             | *k*:int    |                                                     |\n+---------------------------+-----------------------------+------------+-----------------------------------------------------+\n| \u2113-diversity               | ``l_diversity``             | *\u2113*: int   |                                                     |\n+---------------------------+-----------------------------+------------+-----------------------------------------------------+\n| Entropy \u2113-diversity       | ``entropy_l_diversity``     | *\u2113*: int   |                                                     |\n+---------------------------+-----------------------------+------------+-----------------------------------------------------+\n| Recursive (c,\u2113)-diversity | ``recursive_c_l_diversity`` | *c*: int   | Not calculated if \u2113=1                               |\n|                           |                             | *\u2113*: int   |                                                     |\n+---------------------------+-----------------------------+------------+-----------------------------------------------------+\n| Basic \u03b2-likeness          | ``basic_beta_likeness``     | *\u03b2*: float |                                                     |\n+---------------------------+-----------------------------+------------+-----------------------------------------------------+\n| Enhanced \u03b2-likeness       | ``enhanced_beta_likeness``  | *\u03b2*: float |                                                     |\n+---------------------------+-----------------------------+------------+-----------------------------------------------------+\n| t-closeness               | ``t_closeness``             | *t*: float | For numerical attributes the definition of the EMD  |\n|                           |                             |            | (one-dimensional Earth Mover\u2019s Distance) is used.   |\n|                           |                             |            | For categorical attributes, the metric \"Equal       |\n|                           |                             |            | Distance\" is used.                                  |\n+---------------------------+-----------------------------+------------+-----------------------------------------------------+\n| \u03b4-disclosure privacy      | ``delta_disclosure``        | *\u03b4*: float |                                                     |\n+---------------------------+-----------------------------+------------+-----------------------------------------------------+\n\nMore information can be found in this `paper <https://www.nature.com/articles/s41597-022-01894-2>`__.\n\nIn addition, a report can be obtained including information on the equivalence claases and the \nusefulness of the data. In particular, for the latter the following three classically used metrics\nare implemented (as defined in the `documentation <https://pycanon.readthedocs.io/>`__): \n*average equivalence class size*, *classification metric* and *discernability metric*.\n\nCitation\n-----------\nIf you are using pyCANON you can cite it as follows:: \n\n   @article{sainzpardo2022pycanon,\n      title={A Python library to check the level of anonymity of a dataset},\n      author={S{\\'a}inz-Pardo D{\\'\\i}az, Judith and L{\\'o}pez Garc{\\'\\i}a, {\\'A}lvaro},\n      journal={Scientific Data},\n      volume={9},\n      number={1},\n      pages={785},\n      year={2022},\n      publisher={Nature Publishing Group UK London}}\n\n\nAcknowledgments\n-----------------\n\nThe authors would like to thank the funding through the European Union - NextGenerationEU \n(Regulation EU 2020/2094), through CSIC\u2019s Global Health Platform (PTI+ Salud Global) and \nthe support from the project AI4EOSC \u201cArtificial Intelligence for the European Open Science \nCloud\u201d that has received funding from the European Union\u2019s Horizon Europe research and \ninnovation programme under grant agreement number 101058593.\n\n.. |License| image:: https://img.shields.io/badge/License-Apache_2.0-blue.svg\n   :target: https://gitlab.ifca.es/sainzj/check-anonymity/-/blob/main/LICENSE\n.. |Documentation Status| image:: https://readthedocs.org/projects/pycanon/badge/?version=latest\n   :target: https://pycanon.readthedocs.io/en/latest/?badge=latest\n.. |Pipeline Status| image:: https://gitlab.ifca.es/privacy-security/pycanon/badges/main/pipeline.svg\n   :target: https://gitlab.ifca.es/privacy-security/pycanon/-/pipelines\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "pyCANON, A Python library to check the level of anonymity of a dataset",
    "version": "1.0.1.post2",
    "project_urls": {
        "Homepage": "https://gitlab.ifca.es/privacy-security/pycanon"
    },
    "split_keywords": [
        "data",
        "privacy",
        "anonymity"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "03a3de0975fd1be294b69f4b6ca6de5f576683c2670a846286d383e67184b407",
                "md5": "2b8885cfb17a32cc3785993767a7971c",
                "sha256": "b3ebd13c7f253ba78ec12074e1e6c546595af733c6cd0b0b39fb04102ddeb353"
            },
            "downloads": -1,
            "filename": "pycanon-1.0.1.post2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2b8885cfb17a32cc3785993767a7971c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 30818,
            "upload_time": "2024-03-07T15:18:28",
            "upload_time_iso_8601": "2024-03-07T15:18:28.199683Z",
            "url": "https://files.pythonhosted.org/packages/03/a3/de0975fd1be294b69f4b6ca6de5f576683c2670a846286d383e67184b407/pycanon-1.0.1.post2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "15870aa3acadb1b3ec3f607748b388e7fcb74c0fc58e9417198f6da4c2d595cf",
                "md5": "4d563be56ef301770deb4c4990db0f98",
                "sha256": "1fc2a50a156488d61a240e57312984cd1b6dfc5008d2671f1e72a4f30dca58f3"
            },
            "downloads": -1,
            "filename": "pycanon-1.0.1.post2.tar.gz",
            "has_sig": false,
            "md5_digest": "4d563be56ef301770deb4c4990db0f98",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 19982,
            "upload_time": "2024-03-07T15:18:29",
            "upload_time_iso_8601": "2024-03-07T15:18:29.892260Z",
            "url": "https://files.pythonhosted.org/packages/15/87/0aa3acadb1b3ec3f607748b388e7fcb74c0fc58e9417198f6da4c2d595cf/pycanon-1.0.1.post2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-07 15:18:29",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "pycanon"
}
        
Elapsed time: 0.20192s