anonypy


Nameanonypy JSON
Version 0.2.1 PyPI version JSON
download
home_pageNone
SummaryAnonymization library for python
upload_time2024-09-20 23:52:02
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords k-anonymity l-diversity mondrian t-closeness
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # AnonyPy
Anonymization library for python.
AnonyPy provides following privacy preserving techniques for the anonymization.
- K Anonymity
- L Diversity
- T Closeness

## The Anonymization method
- Anonymization method aims at making the individual record be indistinguishable among a group record by using techniques of generalization and suppression.
- Turning a dataset into a k-anonymous (and possibly l-diverse or t-close) dataset is a complex problem, and finding the optimal partition into k-anonymous groups is an NP-hard problem.
- AnonyPy uses "Mondrian" algorithm to partition the original data into smaller and smaller groups
- The algorithm assumes that we have converted all attributes into numerical or categorical values and that we are able to measure the “span” of a given attribute Xi.

## Install
```
$ pip install anonypy
```

## Usage
```python
import anonypy
import pandas as pd

data = [
    [6, "1", "test1", "x", 20],
    [6, "1", "test1", "x", 30],
    [8, "2", "test2", "x", 50],
    [8, "2", "test3", "w", 45],
    [8, "1", "test2", "y", 35],
    [4, "2", "test3", "y", 20],
    [4, "1", "test3", "y", 20],
    [2, "1", "test3", "z", 22],
    [2, "2", "test3", "y", 32],
]

columns = ["col1", "col2", "col3", "col4", "col5"]
categorical = set(("col2", "col3", "col4"))

def main():
    df = pd.DataFrame(data=data, columns=columns)

    for name in categorical:
        df[name] = df[name].astype("category")

    feature_columns = ["col1", "col2", "col3"]
    sensitive_column = "col4"

    p = anonypy.Preserver(df, feature_columns, sensitive_column)
    rows = p.anonymize_k_anonymity(k=2)

    dfn = pd.DataFrame(rows)
    print(dfn)
```

Original data
```bash
   col1 col2   col3 col4  col5
0     6    1  test1    x    20
1     6    1  test1    x    30
2     8    2  test2    x    50
3     8    2  test3    w    45
4     8    1  test2    y    35
5     4    2  test3    y    20
6     4    1  test3    y    20
7     2    1  test3    z    22
8     2    2  test3    y    32
```

The created anonymized data is below(Guarantee 2-anonymity).
```bash
  col1 col2         col3 col4  count
0  2-4    2        test3    y      2
1  2-4    1        test3    y      1
2  2-4    1        test3    z      1
3  6-8    1  test1,test2    x      2
4  6-8    1  test1,test2    y      1
5    8    2  test3,test2    w      1
6    8    2  test3,test2    x      1
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "anonypy",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "k-anonymity, l-diversity, mondrian, t-closeness",
    "author": null,
    "author_email": "glassonion1 <glassonion999@gmail.com>",
    "download_url": null,
    "platform": null,
    "description": "# AnonyPy\nAnonymization library for python.\nAnonyPy provides following privacy preserving techniques for the anonymization.\n- K Anonymity\n- L Diversity\n- T Closeness\n\n## The Anonymization method\n- Anonymization method aims at making the individual record be indistinguishable among a group record by using techniques of generalization and suppression.\n- Turning a dataset into a k-anonymous (and possibly l-diverse or t-close) dataset is a complex problem, and finding the optimal partition into k-anonymous groups is an NP-hard problem.\n- AnonyPy uses \"Mondrian\" algorithm to partition the original data into smaller and smaller groups\n- The algorithm assumes that we have converted all attributes into numerical or categorical values and that we are able to measure the \u201cspan\u201d of a given attribute Xi.\n\n## Install\n```\n$ pip install anonypy\n```\n\n## Usage\n```python\nimport anonypy\nimport pandas as pd\n\ndata = [\n    [6, \"1\", \"test1\", \"x\", 20],\n    [6, \"1\", \"test1\", \"x\", 30],\n    [8, \"2\", \"test2\", \"x\", 50],\n    [8, \"2\", \"test3\", \"w\", 45],\n    [8, \"1\", \"test2\", \"y\", 35],\n    [4, \"2\", \"test3\", \"y\", 20],\n    [4, \"1\", \"test3\", \"y\", 20],\n    [2, \"1\", \"test3\", \"z\", 22],\n    [2, \"2\", \"test3\", \"y\", 32],\n]\n\ncolumns = [\"col1\", \"col2\", \"col3\", \"col4\", \"col5\"]\ncategorical = set((\"col2\", \"col3\", \"col4\"))\n\ndef main():\n    df = pd.DataFrame(data=data, columns=columns)\n\n    for name in categorical:\n        df[name] = df[name].astype(\"category\")\n\n    feature_columns = [\"col1\", \"col2\", \"col3\"]\n    sensitive_column = \"col4\"\n\n    p = anonypy.Preserver(df, feature_columns, sensitive_column)\n    rows = p.anonymize_k_anonymity(k=2)\n\n    dfn = pd.DataFrame(rows)\n    print(dfn)\n```\n\nOriginal data\n```bash\n   col1 col2   col3 col4  col5\n0     6    1  test1    x    20\n1     6    1  test1    x    30\n2     8    2  test2    x    50\n3     8    2  test3    w    45\n4     8    1  test2    y    35\n5     4    2  test3    y    20\n6     4    1  test3    y    20\n7     2    1  test3    z    22\n8     2    2  test3    y    32\n```\n\nThe created anonymized data is below(Guarantee 2-anonymity).\n```bash\n  col1 col2         col3 col4  count\n0  2-4    2        test3    y      2\n1  2-4    1        test3    y      1\n2  2-4    1        test3    z      1\n3  6-8    1  test1,test2    x      2\n4  6-8    1  test1,test2    y      1\n5    8    2  test3,test2    w      1\n6    8    2  test3,test2    x      1\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Anonymization library for python",
    "version": "0.2.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/glassonion1/anonypy/issues",
        "Homepage": "https://github.com/glassonion1/anonypy"
    },
    "split_keywords": [
        "k-anonymity",
        " l-diversity",
        " mondrian",
        " t-closeness"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "270973426cb7390b78f4ab8a74fb54ba57b7cc4b377a5b16e9709e15a5bc0dc2",
                "md5": "ad08fd231f08a6b3c0df5724feceed2d",
                "sha256": "ad5a1c14e69dc6399dee8e94ca2b82efec3da5cd13c0e64048354b752296ac1e"
            },
            "downloads": -1,
            "filename": "anonypy-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ad08fd231f08a6b3c0df5724feceed2d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 6757,
            "upload_time": "2024-09-20T23:52:02",
            "upload_time_iso_8601": "2024-09-20T23:52:02.135517Z",
            "url": "https://files.pythonhosted.org/packages/27/09/73426cb7390b78f4ab8a74fb54ba57b7cc4b377a5b16e9709e15a5bc0dc2/anonypy-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-20 23:52:02",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "glassonion1",
    "github_project": "anonypy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "anonypy"
}
        
Elapsed time: 0.44327s