# AnonyPy
Anonymization library for python.
AnonyPy provides following privacy preserving techniques for the anonymization.
- K Anonymity
- L Diversity
- T Closeness
## The Anonymization method
- Anonymization method aims at making the individual record be indistinguishable among a group record by using techniques of generalization and suppression.
- Turning a dataset into a k-anonymous (and possibly l-diverse or t-close) dataset is a complex problem, and finding the optimal partition into k-anonymous groups is an NP-hard problem.
- AnonyPy uses "Mondrian" algorithm to partition the original data into smaller and smaller groups
- The algorithm assumes that we have converted all attributes into numerical or categorical values and that we are able to measure the “span” of a given attribute Xi.
## Install
```
$ pip install anonypy
```
## Usage
```python
import anonypy
import pandas as pd
data = [
[6, "1", "test1", "x", 20],
[6, "1", "test1", "x", 30],
[8, "2", "test2", "x", 50],
[8, "2", "test3", "w", 45],
[8, "1", "test2", "y", 35],
[4, "2", "test3", "y", 20],
[4, "1", "test3", "y", 20],
[2, "1", "test3", "z", 22],
[2, "2", "test3", "y", 32],
]
columns = ["col1", "col2", "col3", "col4", "col5"]
categorical = set(("col2", "col3", "col4"))
def main():
df = pd.DataFrame(data=data, columns=columns)
for name in categorical:
df[name] = df[name].astype("category")
feature_columns = ["col1", "col2", "col3"]
sensitive_column = "col4"
p = anonypy.Preserver(df, feature_columns, sensitive_column)
rows = p.anonymize_k_anonymity(k=2)
dfn = pd.DataFrame(rows)
print(dfn)
```
Original data
```bash
col1 col2 col3 col4 col5
0 6 1 test1 x 20
1 6 1 test1 x 30
2 8 2 test2 x 50
3 8 2 test3 w 45
4 8 1 test2 y 35
5 4 2 test3 y 20
6 4 1 test3 y 20
7 2 1 test3 z 22
8 2 2 test3 y 32
```
The created anonymized data is below(Guarantee 2-anonymity).
```bash
col1 col2 col3 col4 count
0 2-4 2 test3 y 2
1 2-4 1 test3 y 1
2 2-4 1 test3 z 1
3 6-8 1 test1,test2 x 2
4 6-8 1 test1,test2 y 1
5 8 2 test3,test2 w 1
6 8 2 test3,test2 x 1
```
Raw data
{
"_id": null,
"home_page": null,
"name": "anonypy",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "k-anonymity, l-diversity, mondrian, t-closeness",
"author": null,
"author_email": "glassonion1 <glassonion999@gmail.com>",
"download_url": null,
"platform": null,
"description": "# AnonyPy\nAnonymization library for python.\nAnonyPy provides following privacy preserving techniques for the anonymization.\n- K Anonymity\n- L Diversity\n- T Closeness\n\n## The Anonymization method\n- Anonymization method aims at making the individual record be indistinguishable among a group record by using techniques of generalization and suppression.\n- Turning a dataset into a k-anonymous (and possibly l-diverse or t-close) dataset is a complex problem, and finding the optimal partition into k-anonymous groups is an NP-hard problem.\n- AnonyPy uses \"Mondrian\" algorithm to partition the original data into smaller and smaller groups\n- The algorithm assumes that we have converted all attributes into numerical or categorical values and that we are able to measure the \u201cspan\u201d of a given attribute Xi.\n\n## Install\n```\n$ pip install anonypy\n```\n\n## Usage\n```python\nimport anonypy\nimport pandas as pd\n\ndata = [\n [6, \"1\", \"test1\", \"x\", 20],\n [6, \"1\", \"test1\", \"x\", 30],\n [8, \"2\", \"test2\", \"x\", 50],\n [8, \"2\", \"test3\", \"w\", 45],\n [8, \"1\", \"test2\", \"y\", 35],\n [4, \"2\", \"test3\", \"y\", 20],\n [4, \"1\", \"test3\", \"y\", 20],\n [2, \"1\", \"test3\", \"z\", 22],\n [2, \"2\", \"test3\", \"y\", 32],\n]\n\ncolumns = [\"col1\", \"col2\", \"col3\", \"col4\", \"col5\"]\ncategorical = set((\"col2\", \"col3\", \"col4\"))\n\ndef main():\n df = pd.DataFrame(data=data, columns=columns)\n\n for name in categorical:\n df[name] = df[name].astype(\"category\")\n\n feature_columns = [\"col1\", \"col2\", \"col3\"]\n sensitive_column = \"col4\"\n\n p = anonypy.Preserver(df, feature_columns, sensitive_column)\n rows = p.anonymize_k_anonymity(k=2)\n\n dfn = pd.DataFrame(rows)\n print(dfn)\n```\n\nOriginal data\n```bash\n col1 col2 col3 col4 col5\n0 6 1 test1 x 20\n1 6 1 test1 x 30\n2 8 2 test2 x 50\n3 8 2 test3 w 45\n4 8 1 test2 y 35\n5 4 2 test3 y 20\n6 4 1 test3 y 20\n7 2 1 test3 z 22\n8 2 2 test3 y 32\n```\n\nThe created anonymized data is below(Guarantee 2-anonymity).\n```bash\n col1 col2 col3 col4 count\n0 2-4 2 test3 y 2\n1 2-4 1 test3 y 1\n2 2-4 1 test3 z 1\n3 6-8 1 test1,test2 x 2\n4 6-8 1 test1,test2 y 1\n5 8 2 test3,test2 w 1\n6 8 2 test3,test2 x 1\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "Anonymization library for python",
"version": "0.2.1",
"project_urls": {
"Bug Tracker": "https://github.com/glassonion1/anonypy/issues",
"Homepage": "https://github.com/glassonion1/anonypy"
},
"split_keywords": [
"k-anonymity",
" l-diversity",
" mondrian",
" t-closeness"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "270973426cb7390b78f4ab8a74fb54ba57b7cc4b377a5b16e9709e15a5bc0dc2",
"md5": "ad08fd231f08a6b3c0df5724feceed2d",
"sha256": "ad5a1c14e69dc6399dee8e94ca2b82efec3da5cd13c0e64048354b752296ac1e"
},
"downloads": -1,
"filename": "anonypy-0.2.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ad08fd231f08a6b3c0df5724feceed2d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 6757,
"upload_time": "2024-09-20T23:52:02",
"upload_time_iso_8601": "2024-09-20T23:52:02.135517Z",
"url": "https://files.pythonhosted.org/packages/27/09/73426cb7390b78f4ab8a74fb54ba57b7cc4b377a5b16e9709e15a5bc0dc2/anonypy-0.2.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-20 23:52:02",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "glassonion1",
"github_project": "anonypy",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "anonypy"
}