microagg1d


Namemicroagg1d JSON
Version 0.4.0 PyPI version JSON
download
home_pageNone
SummaryA package to perform optimal univariate microaggregation for various cost functions.
upload_time2025-01-08 16:06:58
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseBSD 2-Clause License Copyright (c) 2023, Feelx234 Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
keywords microaggregation clustering kmeans k-means anonymity
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![build](https://github.com/Feelx234/microagg1d/actions/workflows/ci.yaml/badge.svg)](https://github.com/Feelx234/microagg1d/actions)
[![Coverage Status](https://coveralls.io/repos/github/Feelx234/microagg1d/badge.svg)](https://coveralls.io/github/Feelx234/microagg1d)
[![arXiv:2401.02381](https://img.shields.io/badge/arXiv-2401.02381-b31b1b.svg?logo=arxiv)](https://arxiv.org/abs/2401.02381)

microagg1d
========

A Python library which implements different techniques for optimal univariate microaggregation. The two main parameters that determine the runtime are the length n of the input array and minimal class size k. This package offers both O(n) (fast for large k) and O(kn) (fast for small k) algorithms.

The code is written in Python and relies on the [numba](https://numba.pydata.org/) compiler for speed.

Requirements
------------

*microagg1d* relies on `numpy` and `numba` which currently support python 3.8-3.11.

Installation
------------

microagg1d is available on [PyPI](https://pypi.python.org/pypi/microagg1d), the Python Package Index.

```sh
$ pip3 install microagg1d
```

Example Usage
-------------

```python
from microagg1d import univariate_microaggregation

x = [5, 1, 1, 1.1, 5, 1, 5.1]

clusters = univariate_microaggregation(x, k=3)

print(clusters)   # [1 0 0 0 1 0 1]

# explicitly choose method / algorithm
clusters2 = univariate_microaggregation(x, k=3, method="wilber")

print(clusters2)   # [1 0 0 0 1 0 1]

# choose a different cost (sae / sse / roundup / rounddown / maxdist)
clusters3 = univariate_microaggregation(x, k=3, cost="sae")

print(clusters3)   # [1 0 0 0 1 0 1]
```

**Important notice**: On first import the the code is compiled once which may take about 30s. On subsequent imports this is no longer necessary and imports are almost instant.

Tests
-----

Tests are in [tests/](https://github.com/Feelx234/microagg1d/tree/main/tests).

```sh
# Run tests
$ python3 -m pytest .
```

Method Details
--------------

Currently the package implements the following methods:
- `"simple"` [O(nk), faster for small k]
- `"wilber"` [O(n), faster for larger k]
- `"galil_park"` [O(n), fewer calls to SMAWK]
- `"staggered"` [fastest O(n)]

By default, the package switches between the simple and wilber method depending on the size of k.

Both methods rely on a prefix sum approach to compute the cluster cost. As the prefix sums tend to become very large quite quickly, a slightly slower but numerically more robust method is chosen by default. If your data is small, or you don't need the numeric stability then you may choose to also opt out of stable.



License
-------

The code in this repository has an BSD 2-Clause "Simplified" License.

See [LICENSE](https://github.com/Feelx234/microagg1d/blob/master/LICENSE).



Citation
-----------

This code was created as part of the research for the following publication. If you use this package please cite:

```
@article{stamm2024faster,
	title = {Faster optimal univariate microaggregation},
	url = {https://openreview.net/forum?id=s5lEUtyVly},
	journal = {Transactions on Machine Learning Research},
	author = {Stamm, Felix I. and Schaub, Michael T},
	year = {2024},
}
```


References
----------

- Hansen, S.L. and Mukherjee, S., 2003. A polynomial algorithm for optimal univariate microaggregation. IEEE Transactions on Knowledge and Data Engineering
- Wilber, R., 1988. The concave least-weight subsequence problem revisited. Journal of Algorithms, 9(3), pp.418-425.
- Galil, Z. and Park, K., 1989. A linear-time algorithm for concave one-dimensional dynamic programming.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "microagg1d",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "microaggregation, clustering, kmeans, k-means, anonymity",
    "author": null,
    "author_email": "\"Felix I. Stamm\" <felix.stamm@rwth-aachen.de>",
    "download_url": "https://files.pythonhosted.org/packages/44/3d/863c69321e4a9966e2f808dd5d9d60ea2cd714384973ea18bb19593e6016/microagg1d-0.4.0.tar.gz",
    "platform": null,
    "description": "[![build](https://github.com/Feelx234/microagg1d/actions/workflows/ci.yaml/badge.svg)](https://github.com/Feelx234/microagg1d/actions)\n[![Coverage Status](https://coveralls.io/repos/github/Feelx234/microagg1d/badge.svg)](https://coveralls.io/github/Feelx234/microagg1d)\n[![arXiv:2401.02381](https://img.shields.io/badge/arXiv-2401.02381-b31b1b.svg?logo=arxiv)](https://arxiv.org/abs/2401.02381)\n\nmicroagg1d\n========\n\nA Python library which implements different techniques for optimal univariate microaggregation. The two main parameters that determine the runtime are the length n of the input array and minimal class size k. This package offers both O(n) (fast for large k) and O(kn) (fast for small k) algorithms.\n\nThe code is written in Python and relies on the [numba](https://numba.pydata.org/) compiler for speed.\n\nRequirements\n------------\n\n*microagg1d* relies on `numpy` and `numba` which currently support python 3.8-3.11.\n\nInstallation\n------------\n\nmicroagg1d is available on [PyPI](https://pypi.python.org/pypi/microagg1d), the Python Package Index.\n\n```sh\n$ pip3 install microagg1d\n```\n\nExample Usage\n-------------\n\n```python\nfrom microagg1d import univariate_microaggregation\n\nx = [5, 1, 1, 1.1, 5, 1, 5.1]\n\nclusters = univariate_microaggregation(x, k=3)\n\nprint(clusters)   # [1 0 0 0 1 0 1]\n\n# explicitly choose method / algorithm\nclusters2 = univariate_microaggregation(x, k=3, method=\"wilber\")\n\nprint(clusters2)   # [1 0 0 0 1 0 1]\n\n# choose a different cost (sae / sse / roundup / rounddown / maxdist)\nclusters3 = univariate_microaggregation(x, k=3, cost=\"sae\")\n\nprint(clusters3)   # [1 0 0 0 1 0 1]\n```\n\n**Important notice**: On first import the the code is compiled once which may take about 30s. On subsequent imports this is no longer necessary and imports are almost instant.\n\nTests\n-----\n\nTests are in [tests/](https://github.com/Feelx234/microagg1d/tree/main/tests).\n\n```sh\n# Run tests\n$ python3 -m pytest .\n```\n\nMethod Details\n--------------\n\nCurrently the package implements the following methods:\n- `\"simple\"` [O(nk), faster for small k]\n- `\"wilber\"` [O(n), faster for larger k]\n- `\"galil_park\"` [O(n), fewer calls to SMAWK]\n- `\"staggered\"` [fastest O(n)]\n\nBy default, the package switches between the simple and wilber method depending on the size of k.\n\nBoth methods rely on a prefix sum approach to compute the cluster cost. As the prefix sums tend to become very large quite quickly, a slightly slower but numerically more robust method is chosen by default. If your data is small, or you don't need the numeric stability then you may choose to also opt out of stable.\n\n\n\nLicense\n-------\n\nThe code in this repository has an BSD 2-Clause \"Simplified\" License.\n\nSee [LICENSE](https://github.com/Feelx234/microagg1d/blob/master/LICENSE).\n\n\n\nCitation\n-----------\n\nThis code was created as part of the research for the following publication. If you use this package please cite:\n\n```\n@article{stamm2024faster,\n\ttitle = {Faster optimal univariate microaggregation},\n\turl = {https://openreview.net/forum?id=s5lEUtyVly},\n\tjournal = {Transactions on Machine Learning Research},\n\tauthor = {Stamm, Felix I. and Schaub, Michael T},\n\tyear = {2024},\n}\n```\n\n\nReferences\n----------\n\n- Hansen, S.L. and Mukherjee, S., 2003. A polynomial algorithm for optimal univariate microaggregation. IEEE Transactions on Knowledge and Data Engineering\n- Wilber, R., 1988. The concave least-weight subsequence problem revisited. Journal of Algorithms, 9(3), pp.418-425.\n- Galil, Z. and Park, K., 1989. A linear-time algorithm for concave one-dimensional dynamic programming.\n",
    "bugtrack_url": null,
    "license": "BSD 2-Clause License  Copyright (c) 2023, Feelx234  Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ",
    "summary": "A package to perform optimal univariate microaggregation for various cost functions.",
    "version": "0.4.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/Feelx234/microagg1d/issues",
        "Homepage": "https://github.com/Feelx234/microagg1d"
    },
    "split_keywords": [
        "microaggregation",
        " clustering",
        " kmeans",
        " k-means",
        " anonymity"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1a949a497d8f7367a1bf6a571c12758bb95e58e865f6e7a32dfbf20e2c990ffd",
                "md5": "acc7c59c95f273304e11877158498c2d",
                "sha256": "521a206ecf539401e7e892dc679dfbcf16bb2931aa15a96804c8d1f64a7e7215"
            },
            "downloads": -1,
            "filename": "microagg1d-0.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "acc7c59c95f273304e11877158498c2d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 30637,
            "upload_time": "2025-01-08T16:06:55",
            "upload_time_iso_8601": "2025-01-08T16:06:55.937619Z",
            "url": "https://files.pythonhosted.org/packages/1a/94/9a497d8f7367a1bf6a571c12758bb95e58e865f6e7a32dfbf20e2c990ffd/microagg1d-0.4.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "443d863c69321e4a9966e2f808dd5d9d60ea2cd714384973ea18bb19593e6016",
                "md5": "18acc0e6171abaf3545721846e86a73e",
                "sha256": "c0257360bf7d4733209f3de5d2373689095e58540c47041ac2a66c9bf14df43e"
            },
            "downloads": -1,
            "filename": "microagg1d-0.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "18acc0e6171abaf3545721846e86a73e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 586476,
            "upload_time": "2025-01-08T16:06:58",
            "upload_time_iso_8601": "2025-01-08T16:06:58.157314Z",
            "url": "https://files.pythonhosted.org/packages/44/3d/863c69321e4a9966e2f808dd5d9d60ea2cd714384973ea18bb19593e6016/microagg1d-0.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-08 16:06:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Feelx234",
    "github_project": "microagg1d",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "microagg1d"
}
        
Elapsed time: 0.45732s