# Differentiable and Fast Geometric Median in NumPy and PyTorch
This package implements a fast numerical algorithm to compute the geometric median of high dimensional vectors.
As a generalization of the median (of scalars), the [geometric median](https://en.wikipedia.org/wiki/Geometric_median)
is a robust estimator of the mean in the presence of outliers and contaminations (adversarial or otherwise).

The geometric median is also known as the Fermat point, Weber's L1 median, Fréchet median among others.
It has a breakdown point of 0.5, meaning that it yields a robust aggregate even under arbitrary corruptions to points accounting for under half the total weight. We use the smoothed Weiszfeld algorithm to compute the geometric median.
**Features**:
- Implementation in both NumPy and PyTorch.
- PyTorch implementation is fully differentiable (compatible with gradient backpropagation a.k.a. automatic differentiation) and can run on GPUs with CUDA tensors.
- Blazing fast algorithm that converges linearly in almost all practical settings.
# Installation
This package can be installed via pip as `pip install geom_median`. Alternatively, for an editable install,
run
```bash
git clone git@github.com:krishnap25/geom_median.git
cd geom_median
pip install -e .
```
You must have a working installation of PyTorch, version 1.7 or over in case you wish to use the PyTorch API.
See details [here](https://pytorch.org/get-started/locally/).
# Usage Guide
We describe the PyTorch usage here. The NumPy API is entirely analogous.
```python
import torch
from geom_median.torch import compute_geometric_median # PyTorch API
# from geom_median.numpy import compute_geometric_median # NumPy API
```
For the simplest use case, supply a list of tensors:
```python
n = 10 # Number of vectors
d = 25 # dimensionality of each vector
points = [torch.rand(d) for _ in range(n)] # list of n tensors of shape (d,)
# The shape of each tensor is the same and can be arbitrary (not necessarily 1-dimensional)
weights = torch.rand(n) # non-negative weights of shape (n,)
out = compute_geometric_median(points, weights)
# Access the median via `out.median`, which has the same shape as the points, i.e., (d,)
```
The termination condition can be examined through `out.termination`, which gives a message such as
`"function value converged within tolerance"` or `"maximum iterations reached"`.
We also support a use case where each point is given by list of tensors.
For instance, each point is the list of parameters of a `torch.nn.Module` for instance as `point = list(module.parameters())`.
In this case, this is equivalent to flattening and concatenating all the tensors into a single vector via
`flatted_point = torch.stack([v.view(-1) for v in point])`.
This functionality can be invoked as follows:
```python
models = [torch.nn.Linear(20, 10) for _ in range(n)] # a list of n models
points = [list(model.parameters()) for model in models] # list of points, where each point is a list of tensors
out = compute_geometric_median(points, weights=None) # equivalent to `weights = torch.ones(n)`.
# Access the median via `out.median`, also given as a list of tensors
```
We also support computing the geometric median for each component separately in the list-of-tensors format:
```python
models = [torch.nn.Linear(20, 10) for _ in range(n)] # a list of n models
points = [list(model.parameters()) for model in models] # list of points, where each point is a list of tensors
out = compute_geometric_median(points, weights=None, per_component=True)
# Access the median via `out.median`, also given as a list of tensors
```
This per-component geometric median is equivalent in functionality to
```python
out.median[j] = compute_geometric_median([p[j] for p in points], weights)
```
## Backpropagation support
When using the PyTorch API, the result `out.median`, as a function of `points`, supports gradient backpropagation, also known as reverse-mode automatic differentiation. Here is a toy example illustrating this behavior.
```python
points = [torch.rand(d).requires_grad_(True) for _ in range(n)] # list of tensors with `requires_grad=True`
out = compute_geometric_median(points, weights=None)
torch.linalg.norm(out.median).backward() # call backward on any downstream function of `out.median`
gradients = [p.grad for p in points] # gradients with respect of `points` and upstream nodes in the computation graph
```
## GPU support
Simply use as above where `points` and `weights` are CUDA tensors.
# Authors and Contact
[Krishna Pillutla](https://krishnap25.github.io/)
[Sham Kakade](https://sham.seas.harvard.edu/)
[Zaid Harchaoui](https://faculty.washington.edu/zaid/)
In case of questions, please raise an issue on GitHub.
# Citation
If you found this package useful, please consider citing this paper.
```
@article{pillutla:etal:rfa ,
title={{Robust Aggregation for Federated Learning}},
author={Pillutla, Krishna and Kakade, Sham M. and Harchaoui, Zaid},
journal={arXiv preprint},
year={2019}
}
```
Raw data
{
"_id": null,
"home_page": "https://github.com/krishnap25/geom_median",
"name": "geom-median",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "",
"author": "Krishna Pillutla",
"author_email": "pillutla@cs.washington.edu",
"download_url": "https://files.pythonhosted.org/packages/0e/b9/13101da828812b56df9e8fa12a62f24408dcb416edcd6b3b091c973306b3/geom_median-0.1.0.tar.gz",
"platform": "",
"description": "# Differentiable and Fast Geometric Median in NumPy and PyTorch\n\nThis package implements a fast numerical algorithm to compute the geometric median of high dimensional vectors.\nAs a generalization of the median (of scalars), the [geometric median](https://en.wikipedia.org/wiki/Geometric_median) \nis a robust estimator of the mean in the presence of outliers and contaminations (adversarial or otherwise). \n\n\n\nThe geometric median is also known as the Fermat point, Weber's L1 median, Fr\u00e9chet median among others. \nIt has a breakdown point of 0.5, meaning that it yields a robust aggregate even under arbitrary corruptions to points accounting for under half the total weight. We use the smoothed Weiszfeld algorithm to compute the geometric median. \n\n**Features**:\n- Implementation in both NumPy and PyTorch.\n- PyTorch implementation is fully differentiable (compatible with gradient backpropagation a.k.a. automatic differentiation) and can run on GPUs with CUDA tensors.\n- Blazing fast algorithm that converges linearly in almost all practical settings. \n\n# Installation\nThis package can be installed via pip as `pip install geom_median`. Alternatively, for an editable install, \nrun\n```bash\ngit clone git@github.com:krishnap25/geom_median.git\ncd geom_median\npip install -e .\n```\n\nYou must have a working installation of PyTorch, version 1.7 or over in case you wish to use the PyTorch API. \nSee details [here](https://pytorch.org/get-started/locally/).\n\n# Usage Guide\nWe describe the PyTorch usage here. The NumPy API is entirely analogous. \n\n```python\nimport torch\nfrom geom_median.torch import compute_geometric_median # PyTorch API\n# from geom_median.numpy import compute_geometric_median # NumPy API\n```\n\nFor the simplest use case, supply a list of tensors: \n\n```python\nn = 10 # Number of vectors\nd = 25 # dimensionality of each vector\npoints = [torch.rand(d) for _ in range(n)] # list of n tensors of shape (d,)\n# The shape of each tensor is the same and can be arbitrary (not necessarily 1-dimensional)\nweights = torch.rand(n) # non-negative weights of shape (n,)\nout = compute_geometric_median(points, weights)\n# Access the median via `out.median`, which has the same shape as the points, i.e., (d,)\n```\nThe termination condition can be examined through `out.termination`, which gives a message such as \n`\"function value converged within tolerance\"` or `\"maximum iterations reached\"`.\n\nWe also support a use case where each point is given by list of tensors. \nFor instance, each point is the list of parameters of a `torch.nn.Module` for instance as `point = list(module.parameters())`.\nIn this case, this is equivalent to flattening and concatenating all the tensors into a single vector via \n`flatted_point = torch.stack([v.view(-1) for v in point])`.\nThis functionality can be invoked as follows: \n\n```python\nmodels = [torch.nn.Linear(20, 10) for _ in range(n)] # a list of n models\npoints = [list(model.parameters()) for model in models] # list of points, where each point is a list of tensors\nout = compute_geometric_median(points, weights=None) # equivalent to `weights = torch.ones(n)`. \n# Access the median via `out.median`, also given as a list of tensors\n```\n\nWe also support computing the geometric median for each component separately in the list-of-tensors format:\n```python\nmodels = [torch.nn.Linear(20, 10) for _ in range(n)] # a list of n models\npoints = [list(model.parameters()) for model in models] # list of points, where each point is a list of tensors\nout = compute_geometric_median(points, weights=None, per_component=True) \n# Access the median via `out.median`, also given as a list of tensors\n```\nThis per-component geometric median is equivalent in functionality to \n```python\nout.median[j] = compute_geometric_median([p[j] for p in points], weights)\n```\n\n## Backpropagation support\nWhen using the PyTorch API, the result `out.median`, as a function of `points`, supports gradient backpropagation, also known as reverse-mode automatic differentiation. Here is a toy example illustrating this behavior.\n```python\npoints = [torch.rand(d).requires_grad_(True) for _ in range(n)] # list of tensors with `requires_grad=True`\nout = compute_geometric_median(points, weights=None)\ntorch.linalg.norm(out.median).backward() # call backward on any downstream function of `out.median`\ngradients = [p.grad for p in points] # gradients with respect of `points` and upstream nodes in the computation graph\n```\n\n## GPU support\nSimply use as above where `points` and `weights` are CUDA tensors. \n\n# Authors and Contact\n[Krishna Pillutla](https://krishnap25.github.io/) \n[Sham Kakade](https://sham.seas.harvard.edu/) \n[Zaid Harchaoui](https://faculty.washington.edu/zaid/)\n\nIn case of questions, please raise an issue on GitHub. \n\n# Citation\nIf you found this package useful, please consider citing this paper. \n\n```\n@article{pillutla:etal:rfa ,\n title={{Robust Aggregation for Federated Learning}},\n author={Pillutla, Krishna and Kakade, Sham M. and Harchaoui, Zaid},\n journal={arXiv preprint},\n year={2019}\n}\n```\n\n\n",
"bugtrack_url": null,
"license": "",
"summary": "Implementation of the smoothed Weiszfeld algorithm to compute the geometric median",
"version": "0.1.0",
"project_urls": {
"Bug Tracker": "https://github.com/krishnap25/geom_median/issues",
"Homepage": "https://github.com/krishnap25/geom_median"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0eb913101da828812b56df9e8fa12a62f24408dcb416edcd6b3b091c973306b3",
"md5": "75e1294eb82824e5771fcab060b0ad53",
"sha256": "5a5bd8c930fde58febcf00aab9ecea5f12a54bc75589e5063754deb5b6e90495"
},
"downloads": -1,
"filename": "geom_median-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "75e1294eb82824e5771fcab060b0ad53",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 20904,
"upload_time": "2022-01-19T04:41:05",
"upload_time_iso_8601": "2022-01-19T04:41:05.294118Z",
"url": "https://files.pythonhosted.org/packages/0e/b9/13101da828812b56df9e8fa12a62f24408dcb416edcd6b3b091c973306b3/geom_median-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2022-01-19 04:41:05",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "krishnap25",
"github_project": "geom_median",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "geom-median"
}