geom-median

Name	geom-median JSON
Version	0.1.0 JSON
	download
home_page	https://github.com/krishnap25/geom_median
Summary	Implementation of the smoothed Weiszfeld algorithm to compute the geometric median
upload_time	2022-01-19 04:41:05
maintainer
docs_url	None
author	Krishna Pillutla
requires_python	>=3.6
license
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Differentiable and Fast Geometric Median in NumPy and PyTorch

This package implements a fast numerical algorithm to compute the geometric median of high dimensional vectors.
As a generalization of the median (of scalars), the [geometric median](https://en.wikipedia.org/wiki/Geometric_median) 
is a robust estimator of the mean in the presence of outliers and contaminations (adversarial or otherwise). 

![definition](fig/gm.jpg)

The geometric median is also known as the Fermat point, Weber's L1 median, Fréchet median among others. 
It has a breakdown point of 0.5, meaning that it yields a robust aggregate even under arbitrary corruptions to points accounting for under half the total weight. We use the smoothed Weiszfeld algorithm to compute the geometric median. 

**Features**:
- Implementation in both NumPy and PyTorch.
- PyTorch implementation is fully differentiable (compatible with gradient backpropagation a.k.a. automatic differentiation) and can run on GPUs with CUDA tensors.
- Blazing fast algorithm that converges linearly in almost all practical settings. 

# Installation
This package can be installed via pip as `pip install geom_median`. Alternatively, for an editable install, 
run
```bash
git clone git@github.com:krishnap25/geom_median.git
cd geom_median
pip install -e .
```

You must have a working installation of PyTorch, version 1.7 or over in case you wish to use the PyTorch API. 
See details [here](https://pytorch.org/get-started/locally/).

# Usage Guide
We describe the PyTorch usage here. The NumPy API is entirely analogous. 

```python
import torch
from geom_median.torch import compute_geometric_median   # PyTorch API
# from geom_median.numpy import compute_geometric_median  # NumPy API
```

For the simplest use case, supply a list of tensors: 

```python
n = 10  # Number of vectors
d = 25  # dimensionality of each vector
points = [torch.rand(d) for _ in range(n)]   # list of n tensors of shape (d,)
# The shape of each tensor is the same and can be arbitrary (not necessarily 1-dimensional)
weights = torch.rand(n)  # non-negative weights of shape (n,)
out = compute_geometric_median(points, weights)
# Access the median via `out.median`, which has the same shape as the points, i.e., (d,)
```
The termination condition can be examined through `out.termination`, which gives a message such as 
`"function value converged within tolerance"` or `"maximum iterations reached"`.

We also support a use case where each point is given by list of tensors. 
For instance, each point is the list of parameters of a `torch.nn.Module` for instance as `point = list(module.parameters())`.
In this case, this is equivalent to flattening and concatenating all the tensors into a single vector via 
`flatted_point = torch.stack([v.view(-1) for v in point])`.
This functionality can be invoked as follows: 

```python
models = [torch.nn.Linear(20, 10) for _ in range(n)]  # a list of n models
points = [list(model.parameters()) for model in models]  # list of points, where each point is a list of tensors
out = compute_geometric_median(points, weights=None)  # equivalent to `weights = torch.ones(n)`. 
# Access the median via `out.median`, also given as a list of tensors
```

We also support computing the geometric median for each component separately in the list-of-tensors format:
```python
models = [torch.nn.Linear(20, 10) for _ in range(n)]  # a list of n models
points = [list(model.parameters()) for model in models]  # list of points, where each point is a list of tensors
out = compute_geometric_median(points, weights=None, per_component=True)  
# Access the median via `out.median`, also given as a list of tensors
```
This per-component geometric median is equivalent in functionality to 
```python
out.median[j] = compute_geometric_median([p[j] for p in points], weights)
```

## Backpropagation support
When using the PyTorch API, the result `out.median`, as a function of `points`, supports gradient backpropagation, also known as reverse-mode automatic differentiation. Here is a toy example illustrating this behavior.
```python
points = [torch.rand(d).requires_grad_(True) for _ in range(n)]   # list of tensors with `requires_grad=True`
out = compute_geometric_median(points, weights=None)
torch.linalg.norm(out.median).backward()  # call backward on any downstream function of `out.median`
gradients = [p.grad for p in points]  # gradients with respect of `points` and upstream nodes in the computation graph
```

## GPU support
Simply use as above where `points` and `weights` are CUDA tensors. 

# Authors and Contact
[Krishna Pillutla](https://krishnap25.github.io/)   
[Sham Kakade](https://sham.seas.harvard.edu/)   
[Zaid Harchaoui](https://faculty.washington.edu/zaid/)

In case of questions, please raise an issue on GitHub. 

# Citation
If you found this package useful, please consider citing this paper. 

```
@article{pillutla:etal:rfa ,
  title={{Robust Aggregation for Federated Learning}},
  author={Pillutla, Krishna and  Kakade, Sham M. and Harchaoui, Zaid},
  journal={arXiv preprint},
  year={2019}
}
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/krishnap25/geom_median",
    "name": "geom-median",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "",
    "author": "Krishna Pillutla",
    "author_email": "pillutla@cs.washington.edu",
    "download_url": "https://files.pythonhosted.org/packages/0e/b9/13101da828812b56df9e8fa12a62f24408dcb416edcd6b3b091c973306b3/geom_median-0.1.0.tar.gz",
    "platform": "",
    "description": "# Differentiable and Fast Geometric Median in NumPy and PyTorch\n\nThis package implements a fast numerical algorithm to compute the geometric median of high dimensional vectors.\nAs a generalization of the median (of scalars), the [geometric median](https://en.wikipedia.org/wiki/Geometric_median) \nis a robust estimator of the mean in the presence of outliers and contaminations (adversarial or otherwise). \n\n![definition](fig/gm.jpg)\n\nThe geometric median is also known as the Fermat point, Weber's L1 median, Fr\u00e9chet median among others. \nIt has a breakdown point of 0.5, meaning that it yields a robust aggregate even under arbitrary corruptions to points accounting for under half the total weight. We use the smoothed Weiszfeld algorithm to compute the geometric median. \n\n**Features**:\n- Implementation in both NumPy and PyTorch.\n- PyTorch implementation is fully differentiable (compatible with gradient backpropagation a.k.a. automatic differentiation) and can run on GPUs with CUDA tensors.\n- Blazing fast algorithm that converges linearly in almost all practical settings. \n\n# Installation\nThis package can be installed via pip as `pip install geom_median`. Alternatively, for an editable install, \nrun\n```bash\ngit clone git@github.com:krishnap25/geom_median.git\ncd geom_median\npip install -e .\n```\n\nYou must have a working installation of PyTorch, version 1.7 or over in case you wish to use the PyTorch API. \nSee details [here](https://pytorch.org/get-started/locally/).\n\n# Usage Guide\nWe describe the PyTorch usage here. The NumPy API is entirely analogous. \n\n```python\nimport torch\nfrom geom_median.torch import compute_geometric_median   # PyTorch API\n# from geom_median.numpy import compute_geometric_median  # NumPy API\n```\n\nFor the simplest use case, supply a list of tensors: \n\n```python\nn = 10  # Number of vectors\nd = 25  # dimensionality of each vector\npoints = [torch.rand(d) for _ in range(n)]   # list of n tensors of shape (d,)\n# The shape of each tensor is the same and can be arbitrary (not necessarily 1-dimensional)\nweights = torch.rand(n)  # non-negative weights of shape (n,)\nout = compute_geometric_median(points, weights)\n# Access the median via `out.median`, which has the same shape as the points, i.e., (d,)\n```\nThe termination condition can be examined through `out.termination`, which gives a message such as \n`\"function value converged within tolerance\"` or `\"maximum iterations reached\"`.\n\nWe also support a use case where each point is given by list of tensors. \nFor instance, each point is the list of parameters of a `torch.nn.Module` for instance as `point = list(module.parameters())`.\nIn this case, this is equivalent to flattening and concatenating all the tensors into a single vector via \n`flatted_point = torch.stack([v.view(-1) for v in point])`.\nThis functionality can be invoked as follows: \n\n```python\nmodels = [torch.nn.Linear(20, 10) for _ in range(n)]  # a list of n models\npoints = [list(model.parameters()) for model in models]  # list of points, where each point is a list of tensors\nout = compute_geometric_median(points, weights=None)  # equivalent to `weights = torch.ones(n)`. \n# Access the median via `out.median`, also given as a list of tensors\n```\n\nWe also support computing the geometric median for each component separately in the list-of-tensors format:\n```python\nmodels = [torch.nn.Linear(20, 10) for _ in range(n)]  # a list of n models\npoints = [list(model.parameters()) for model in models]  # list of points, where each point is a list of tensors\nout = compute_geometric_median(points, weights=None, per_component=True)  \n# Access the median via `out.median`, also given as a list of tensors\n```\nThis per-component geometric median is equivalent in functionality to \n```python\nout.median[j] = compute_geometric_median([p[j] for p in points], weights)\n```\n\n## Backpropagation support\nWhen using the PyTorch API, the result `out.median`, as a function of `points`, supports gradient backpropagation, also known as reverse-mode automatic differentiation. Here is a toy example illustrating this behavior.\n```python\npoints = [torch.rand(d).requires_grad_(True) for _ in range(n)]   # list of tensors with `requires_grad=True`\nout = compute_geometric_median(points, weights=None)\ntorch.linalg.norm(out.median).backward()  # call backward on any downstream function of `out.median`\ngradients = [p.grad for p in points]  # gradients with respect of `points` and upstream nodes in the computation graph\n```\n\n## GPU support\nSimply use as above where `points` and `weights` are CUDA tensors. \n\n# Authors and Contact\n[Krishna Pillutla](https://krishnap25.github.io/)   \n[Sham Kakade](https://sham.seas.harvard.edu/)   \n[Zaid Harchaoui](https://faculty.washington.edu/zaid/)\n\nIn case of questions, please raise an issue on GitHub. \n\n# Citation\nIf you found this package useful, please consider citing this paper. \n\n```\n@article{pillutla:etal:rfa ,\n  title={{Robust Aggregation for Federated Learning}},\n  author={Pillutla, Krishna and  Kakade, Sham M. and Harchaoui, Zaid},\n  journal={arXiv preprint},\n  year={2019}\n}\n```\n\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Implementation of the smoothed Weiszfeld algorithm to compute the geometric median",
    "version": "0.1.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/krishnap25/geom_median/issues",
        "Homepage": "https://github.com/krishnap25/geom_median"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0eb913101da828812b56df9e8fa12a62f24408dcb416edcd6b3b091c973306b3",
                "md5": "75e1294eb82824e5771fcab060b0ad53",
                "sha256": "5a5bd8c930fde58febcf00aab9ecea5f12a54bc75589e5063754deb5b6e90495"
            },
            "downloads": -1,
            "filename": "geom_median-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "75e1294eb82824e5771fcab060b0ad53",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 20904,
            "upload_time": "2022-01-19T04:41:05",
            "upload_time_iso_8601": "2022-01-19T04:41:05.294118Z",
            "url": "https://files.pythonhosted.org/packages/0e/b9/13101da828812b56df9e8fa12a62f24408dcb416edcd6b3b091c973306b3/geom_median-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-01-19 04:41:05",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "krishnap25",
    "github_project": "geom_median",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "geom-median"
}

Krishna Pillutla