Welford-Remove
==============
This library is a Python (Numpy) implementation of a modified Welford’s
algorithm, which is online and parallel algorithm for calculating
variances. Typically, Welford’s algorithm only allows for adding data
points. This modification allows for removing data points.
Welford’s algorithm is described in the following:
- `Wikipedia:Welford Online
Algorithm <https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm>`__
- `Wikipedia:Welford Parallel
Algorithm <https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm>`__
The modification for removing data points is described here: \*
`StackOverflow
Post <https://stackoverflow.com/questions/30876298/removing-a-prior-sample-while-using-welfords-method-for-computing-single-pass-v>`__
Welford’s original method is more numerically stable than the standard
method as described in the following blog: \* `Accurately computing
running variance <www.johndcook.com/blog/standard_deviation>`__
However, There has been no formal analysis on whether the modified
version of the algorithm provided here is numerically stable, but based
on the testing done in test_welford.test_remove, I have reason to
believe it is.
This library is inspired by the jvf’s implementation, which is
implemented without using numpy library. In particular, this
implementation is a fork of the implementation by a-mitani, \*
Implementation done by jvf: github.com/jvf/welford \* Implementation
done by a-mitani: github.com/a-mitani/welford
Install
-------
Download package via `PyPI
repository <https://pypi.org/project/welford-with-remove/>`__
::
$ pip install welford
Example
-------
For Online Calculation
~~~~~~~~~~~~~~~~~~~~~~
.. code:: python
import numpy as np
from welford import Welford
# Initialize Welford object
w = Welford()
# Input data samples sequentially
w.add(np.array([0, 100]))
w.add(np.array([1, 110]))
w.add(np.array([2, 120]))
# output
print(w.mean) # mean --> [1. 110.]
print(w.var_s) # sample variance --> [1. 100.]
print(w.var_p) # population variance --> [0.6666. 66.66.]
# You can add other samples after calculating variances.
w.add(np.array([3, 130]))
w.add(np.array([4, 140]))
# output with added samples
print(w.mean) # mean --> [2. 120.]
print(w.var_s) # sample variance --> [2.5. 250.]
print(w.var_p) # population variance --> [2. 200.]
# You can remove samples after calculating variances.
w.remove(np.array([3, 130]))
w.remove(np.array([4, 140]))
print(w.mean) # mean --> [1. 110.]
print(w.var_s) # sample variance --> [1. 100.]
print(w.var_p) # population variance --> [0.6666. 66.66.]
# You can also get the standard deviation
print(w.std_s) # sample standard deviation --> [1. 10.]
print(w.std_p) # population standard deviation --> [0.81649658. 8.16496581.]
Welford object supports initialization with data samples and batch
addition of samples.
.. code:: python
import numpy as np
from welford import Welford
# Initialize Welford object with samples.
ini = np.array([[0, 100], [1, 110], [2, 120]])
w = Welford(ini)
# output
print(w.mean) # mean --> [1. 110.]
print(w.var_s) # sample variance --> [1. 100.]
print(w.var_p) # population variance --> [0.66666667. 66.66666667.]
# add other samples through batch method
other_samples = np.array([[3, 130], [4, 140]])
w.add_all(other_samples)
# output with added samples
print(w.mean) # mean --> [2. 120.]
print(w.var_s) # sample variance --> [2.5 250.]
print(w.var_p) # population variance --> [2. 200.]
For Parallel Calculation
~~~~~~~~~~~~~~~~~~~~~~~~
Welford also offers parallel calculation method for variance.
.. code:: python
import numpy as np
from welford import Welford
# Initialize two Welford objects
w_1 = Welford()
w_2 = Welford()
# Each object will calculate variance of each samples in parallel.
# On w_1
w_1.add(np.array([0, 100]))
w_1.add(np.array([1, 110]))
w_1.add(np.array([2, 120]))
print(w_1.var_s) # sample variance --> [1. 100.]
print(w_1.var_p) # population variance --> [0.66666667. 66.66666667.]
# On w_2
w_2.add(np.array([3, 130]))
w_2.add(np.array([4, 140]))
print(w_2.var_s) # sample variance --> [0.5 50.]
print(w_2.var_p) # sample variance --> [0.25 25.]
# You can Merge objects to get variance of WHOLE samples
w_1.merge(w_2)
print(w.var_s) # sample variance --> [2.5. 250.]
print(w_1.var_p) # sample variance --> [2. 200.]
Raw data
{
"_id": null,
"home_page": "https://github.com/18goldr/welford-with-remove",
"name": "welford-remove",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "statistics, online, welford",
"author": "Robert Gold",
"author_email": "18goldr@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/f8/01/bd4994e04dbdbfebc91ea1f91e2dc2da19ae1209876cf510e5d173545a7b/welford-remove-0.1.tar.gz",
"platform": null,
"description": "Welford-Remove\n==============\n\nThis library is a Python (Numpy) implementation of a modified Welford\u2019s\nalgorithm, which is online and parallel algorithm for calculating\nvariances. Typically, Welford\u2019s algorithm only allows for adding data\npoints. This modification allows for removing data points.\n\nWelford\u2019s algorithm is described in the following:\n\n- `Wikipedia:Welford Online\n Algorithm <https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm>`__\n- `Wikipedia:Welford Parallel\n Algorithm <https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm>`__\n\nThe modification for removing data points is described here: \\*\n`StackOverflow\nPost <https://stackoverflow.com/questions/30876298/removing-a-prior-sample-while-using-welfords-method-for-computing-single-pass-v>`__\n\nWelford\u2019s original method is more numerically stable than the standard\nmethod as described in the following blog: \\* `Accurately computing\nrunning variance <www.johndcook.com/blog/standard_deviation>`__\n\nHowever, There has been no formal analysis on whether the modified\nversion of the algorithm provided here is numerically stable, but based\non the testing done in test_welford.test_remove, I have reason to\nbelieve it is.\n\nThis library is inspired by the jvf\u2019s implementation, which is\nimplemented without using numpy library. In particular, this\nimplementation is a fork of the implementation by a-mitani, \\*\nImplementation done by jvf: github.com/jvf/welford \\* Implementation\ndone by a-mitani: github.com/a-mitani/welford\n\nInstall\n-------\n\nDownload package via `PyPI\nrepository <https://pypi.org/project/welford-with-remove/>`__\n\n::\n\n $ pip install welford\n\nExample\n-------\n\nFor Online Calculation\n~~~~~~~~~~~~~~~~~~~~~~\n\n.. code:: python\n\n import numpy as np\n from welford import Welford\n\n # Initialize Welford object\n w = Welford()\n\n # Input data samples sequentially\n w.add(np.array([0, 100]))\n w.add(np.array([1, 110]))\n w.add(np.array([2, 120]))\n\n # output\n print(w.mean) # mean --> [1. 110.]\n print(w.var_s) # sample variance --> [1. 100.]\n print(w.var_p) # population variance --> [0.6666. 66.66.]\n\n # You can add other samples after calculating variances.\n w.add(np.array([3, 130]))\n w.add(np.array([4, 140]))\n\n # output with added samples\n print(w.mean) # mean --> [2. 120.]\n print(w.var_s) # sample variance --> [2.5. 250.]\n print(w.var_p) # population variance --> [2. 200.]\n\n # You can remove samples after calculating variances.\n w.remove(np.array([3, 130]))\n w.remove(np.array([4, 140]))\n print(w.mean) # mean --> [1. 110.]\n print(w.var_s) # sample variance --> [1. 100.]\n print(w.var_p) # population variance --> [0.6666. 66.66.]\n\n # You can also get the standard deviation\n print(w.std_s) # sample standard deviation --> [1. 10.]\n print(w.std_p) # population standard deviation --> [0.81649658. 8.16496581.]\n\nWelford object supports initialization with data samples and batch\naddition of samples.\n\n.. code:: python\n\n import numpy as np\n from welford import Welford\n\n # Initialize Welford object with samples.\n ini = np.array([[0, 100], [1, 110], [2, 120]])\n w = Welford(ini)\n\n # output\n print(w.mean) # mean --> [1. 110.]\n print(w.var_s) # sample variance --> [1. 100.]\n print(w.var_p) # population variance --> [0.66666667. 66.66666667.]\n\n # add other samples through batch method\n other_samples = np.array([[3, 130], [4, 140]])\n w.add_all(other_samples)\n\n # output with added samples\n print(w.mean) # mean --> [2. 120.]\n print(w.var_s) # sample variance --> [2.5 250.]\n print(w.var_p) # population variance --> [2. 200.]\n\nFor Parallel Calculation\n~~~~~~~~~~~~~~~~~~~~~~~~\n\nWelford also offers parallel calculation method for variance.\n\n.. code:: python\n\n import numpy as np\n from welford import Welford\n\n # Initialize two Welford objects\n w_1 = Welford()\n w_2 = Welford()\n\n # Each object will calculate variance of each samples in parallel.\n # On w_1\n w_1.add(np.array([0, 100]))\n w_1.add(np.array([1, 110]))\n w_1.add(np.array([2, 120]))\n print(w_1.var_s) # sample variance --> [1. 100.]\n print(w_1.var_p) # population variance --> [0.66666667. 66.66666667.]\n\n # On w_2\n w_2.add(np.array([3, 130]))\n w_2.add(np.array([4, 140]))\n print(w_2.var_s) # sample variance --> [0.5 50.]\n print(w_2.var_p) # sample variance --> [0.25 25.]\n\n # You can Merge objects to get variance of WHOLE samples\n w_1.merge(w_2)\n print(w.var_s) # sample variance --> [2.5. 250.]\n print(w_1.var_p) # sample variance --> [2. 200.]\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Python (numpy) implementation of Welford's algorithm with the ability to remove data points.",
"version": "0.1",
"project_urls": {
"Homepage": "https://github.com/18goldr/welford-with-remove"
},
"split_keywords": [
"statistics",
" online",
" welford"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f801bd4994e04dbdbfebc91ea1f91e2dc2da19ae1209876cf510e5d173545a7b",
"md5": "8f507b7283d8fd54187b826a1478f9bf",
"sha256": "e61e69dde07916f412f5676535be821e9619f25196863ecf636ebca5606bbc25"
},
"downloads": -1,
"filename": "welford-remove-0.1.tar.gz",
"has_sig": false,
"md5_digest": "8f507b7283d8fd54187b826a1478f9bf",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 7393,
"upload_time": "2024-04-23T03:10:57",
"upload_time_iso_8601": "2024-04-23T03:10:57.713440Z",
"url": "https://files.pythonhosted.org/packages/f8/01/bd4994e04dbdbfebc91ea1f91e2dc2da19ae1209876cf510e5d173545a7b/welford-remove-0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-23 03:10:57",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "18goldr",
"github_project": "welford-with-remove",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "welford-remove"
}