welford-remove


Namewelford-remove JSON
Version 0.1 PyPI version JSON
download
home_pagehttps://github.com/18goldr/welford-with-remove
SummaryPython (numpy) implementation of Welford's algorithm with the ability to remove data points.
upload_time2024-04-23 03:10:57
maintainerNone
docs_urlNone
authorRobert Gold
requires_pythonNone
licenseMIT
keywords statistics online welford
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Welford-Remove
==============

This library is a Python (Numpy) implementation of a modified Welford’s
algorithm, which is online and parallel algorithm for calculating
variances. Typically, Welford’s algorithm only allows for adding data
points. This modification allows for removing data points.

Welford’s algorithm is described in the following:

-  `Wikipedia:Welford Online
   Algorithm <https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm>`__
-  `Wikipedia:Welford Parallel
   Algorithm <https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm>`__

The modification for removing data points is described here: \*
`StackOverflow
Post <https://stackoverflow.com/questions/30876298/removing-a-prior-sample-while-using-welfords-method-for-computing-single-pass-v>`__

Welford’s original method is more numerically stable than the standard
method as described in the following blog: \* `Accurately computing
running variance <www.johndcook.com/blog/standard_deviation>`__

However, There has been no formal analysis on whether the modified
version of the algorithm provided here is numerically stable, but based
on the testing done in test_welford.test_remove, I have reason to
believe it is.

This library is inspired by the jvf’s implementation, which is
implemented without using numpy library. In particular, this
implementation is a fork of the implementation by a-mitani, \*
Implementation done by jvf: github.com/jvf/welford \* Implementation
done by a-mitani: github.com/a-mitani/welford

Install
-------

Download package via `PyPI
repository <https://pypi.org/project/welford-with-remove/>`__

::

   $ pip install welford

Example
-------

For Online Calculation
~~~~~~~~~~~~~~~~~~~~~~

.. code:: python

   import numpy as np
   from welford import Welford

   # Initialize Welford object
   w = Welford()

   # Input data samples sequentially
   w.add(np.array([0, 100]))
   w.add(np.array([1, 110]))
   w.add(np.array([2, 120]))

   # output
   print(w.mean)  # mean --> [1. 110.]
   print(w.var_s)  # sample variance --> [1. 100.]
   print(w.var_p)  # population variance --> [0.6666. 66.66.]

   # You can add other samples after calculating variances.
   w.add(np.array([3, 130]))
   w.add(np.array([4, 140]))

   # output with added samples
   print(w.mean)  # mean --> [2. 120.]
   print(w.var_s)  # sample variance --> [2.5. 250.]
   print(w.var_p)  # population variance --> [2. 200.]

   # You can remove samples after calculating variances.
   w.remove(np.array([3, 130]))
   w.remove(np.array([4, 140]))
   print(w.mean)  # mean --> [1. 110.]
   print(w.var_s)  # sample variance --> [1. 100.]
   print(w.var_p)  # population variance --> [0.6666. 66.66.]

   # You can also get the standard deviation
   print(w.std_s)  # sample standard deviation --> [1. 10.]
   print(w.std_p)  # population standard deviation --> [0.81649658. 8.16496581.]

Welford object supports initialization with data samples and batch
addition of samples.

.. code:: python

   import numpy as np
   from welford import Welford

   # Initialize Welford object with samples.
   ini = np.array([[0, 100], [1, 110], [2, 120]])
   w = Welford(ini)

   # output
   print(w.mean)  # mean --> [1. 110.]
   print(w.var_s)  # sample variance --> [1. 100.]
   print(w.var_p)  # population variance --> [0.66666667. 66.66666667.]

   # add other samples through batch method
   other_samples = np.array([[3, 130], [4, 140]])
   w.add_all(other_samples)

   # output with added samples
   print(w.mean)  # mean --> [2. 120.]
   print(w.var_s)  # sample variance --> [2.5 250.]
   print(w.var_p)  # population variance --> [2. 200.]

For Parallel Calculation
~~~~~~~~~~~~~~~~~~~~~~~~

Welford also offers parallel calculation method for variance.

.. code:: python

   import numpy as np
   from welford import Welford

   # Initialize two Welford objects
   w_1 = Welford()
   w_2 = Welford()

   # Each object will calculate variance of each samples in parallel.
   # On w_1
   w_1.add(np.array([0, 100]))
   w_1.add(np.array([1, 110]))
   w_1.add(np.array([2, 120]))
   print(w_1.var_s)  # sample variance --> [1. 100.]
   print(w_1.var_p)  # population variance --> [0.66666667. 66.66666667.]

   # On w_2
   w_2.add(np.array([3, 130]))
   w_2.add(np.array([4, 140]))
   print(w_2.var_s)  # sample variance --> [0.5 50.]
   print(w_2.var_p)  # sample variance --> [0.25 25.]

   # You can Merge objects to get variance of WHOLE samples
   w_1.merge(w_2)
   print(w.var_s)  # sample variance --> [2.5. 250.]
   print(w_1.var_p)  # sample variance --> [2. 200.]

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/18goldr/welford-with-remove",
    "name": "welford-remove",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "statistics, online, welford",
    "author": "Robert Gold",
    "author_email": "18goldr@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/f8/01/bd4994e04dbdbfebc91ea1f91e2dc2da19ae1209876cf510e5d173545a7b/welford-remove-0.1.tar.gz",
    "platform": null,
    "description": "Welford-Remove\n==============\n\nThis library is a Python (Numpy) implementation of a modified Welford\u2019s\nalgorithm, which is online and parallel algorithm for calculating\nvariances. Typically, Welford\u2019s algorithm only allows for adding data\npoints. This modification allows for removing data points.\n\nWelford\u2019s algorithm is described in the following:\n\n-  `Wikipedia:Welford Online\n   Algorithm <https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm>`__\n-  `Wikipedia:Welford Parallel\n   Algorithm <https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm>`__\n\nThe modification for removing data points is described here: \\*\n`StackOverflow\nPost <https://stackoverflow.com/questions/30876298/removing-a-prior-sample-while-using-welfords-method-for-computing-single-pass-v>`__\n\nWelford\u2019s original method is more numerically stable than the standard\nmethod as described in the following blog: \\* `Accurately computing\nrunning variance <www.johndcook.com/blog/standard_deviation>`__\n\nHowever, There has been no formal analysis on whether the modified\nversion of the algorithm provided here is numerically stable, but based\non the testing done in test_welford.test_remove, I have reason to\nbelieve it is.\n\nThis library is inspired by the jvf\u2019s implementation, which is\nimplemented without using numpy library. In particular, this\nimplementation is a fork of the implementation by a-mitani, \\*\nImplementation done by jvf: github.com/jvf/welford \\* Implementation\ndone by a-mitani: github.com/a-mitani/welford\n\nInstall\n-------\n\nDownload package via `PyPI\nrepository <https://pypi.org/project/welford-with-remove/>`__\n\n::\n\n   $ pip install welford\n\nExample\n-------\n\nFor Online Calculation\n~~~~~~~~~~~~~~~~~~~~~~\n\n.. code:: python\n\n   import numpy as np\n   from welford import Welford\n\n   # Initialize Welford object\n   w = Welford()\n\n   # Input data samples sequentially\n   w.add(np.array([0, 100]))\n   w.add(np.array([1, 110]))\n   w.add(np.array([2, 120]))\n\n   # output\n   print(w.mean)  # mean --> [1. 110.]\n   print(w.var_s)  # sample variance --> [1. 100.]\n   print(w.var_p)  # population variance --> [0.6666. 66.66.]\n\n   # You can add other samples after calculating variances.\n   w.add(np.array([3, 130]))\n   w.add(np.array([4, 140]))\n\n   # output with added samples\n   print(w.mean)  # mean --> [2. 120.]\n   print(w.var_s)  # sample variance --> [2.5. 250.]\n   print(w.var_p)  # population variance --> [2. 200.]\n\n   # You can remove samples after calculating variances.\n   w.remove(np.array([3, 130]))\n   w.remove(np.array([4, 140]))\n   print(w.mean)  # mean --> [1. 110.]\n   print(w.var_s)  # sample variance --> [1. 100.]\n   print(w.var_p)  # population variance --> [0.6666. 66.66.]\n\n   # You can also get the standard deviation\n   print(w.std_s)  # sample standard deviation --> [1. 10.]\n   print(w.std_p)  # population standard deviation --> [0.81649658. 8.16496581.]\n\nWelford object supports initialization with data samples and batch\naddition of samples.\n\n.. code:: python\n\n   import numpy as np\n   from welford import Welford\n\n   # Initialize Welford object with samples.\n   ini = np.array([[0, 100], [1, 110], [2, 120]])\n   w = Welford(ini)\n\n   # output\n   print(w.mean)  # mean --> [1. 110.]\n   print(w.var_s)  # sample variance --> [1. 100.]\n   print(w.var_p)  # population variance --> [0.66666667. 66.66666667.]\n\n   # add other samples through batch method\n   other_samples = np.array([[3, 130], [4, 140]])\n   w.add_all(other_samples)\n\n   # output with added samples\n   print(w.mean)  # mean --> [2. 120.]\n   print(w.var_s)  # sample variance --> [2.5 250.]\n   print(w.var_p)  # population variance --> [2. 200.]\n\nFor Parallel Calculation\n~~~~~~~~~~~~~~~~~~~~~~~~\n\nWelford also offers parallel calculation method for variance.\n\n.. code:: python\n\n   import numpy as np\n   from welford import Welford\n\n   # Initialize two Welford objects\n   w_1 = Welford()\n   w_2 = Welford()\n\n   # Each object will calculate variance of each samples in parallel.\n   # On w_1\n   w_1.add(np.array([0, 100]))\n   w_1.add(np.array([1, 110]))\n   w_1.add(np.array([2, 120]))\n   print(w_1.var_s)  # sample variance --> [1. 100.]\n   print(w_1.var_p)  # population variance --> [0.66666667. 66.66666667.]\n\n   # On w_2\n   w_2.add(np.array([3, 130]))\n   w_2.add(np.array([4, 140]))\n   print(w_2.var_s)  # sample variance --> [0.5 50.]\n   print(w_2.var_p)  # sample variance --> [0.25 25.]\n\n   # You can Merge objects to get variance of WHOLE samples\n   w_1.merge(w_2)\n   print(w.var_s)  # sample variance --> [2.5. 250.]\n   print(w_1.var_p)  # sample variance --> [2. 200.]\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python (numpy) implementation of Welford's algorithm with the ability to remove data points.",
    "version": "0.1",
    "project_urls": {
        "Homepage": "https://github.com/18goldr/welford-with-remove"
    },
    "split_keywords": [
        "statistics",
        " online",
        " welford"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f801bd4994e04dbdbfebc91ea1f91e2dc2da19ae1209876cf510e5d173545a7b",
                "md5": "8f507b7283d8fd54187b826a1478f9bf",
                "sha256": "e61e69dde07916f412f5676535be821e9619f25196863ecf636ebca5606bbc25"
            },
            "downloads": -1,
            "filename": "welford-remove-0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "8f507b7283d8fd54187b826a1478f9bf",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 7393,
            "upload_time": "2024-04-23T03:10:57",
            "upload_time_iso_8601": "2024-04-23T03:10:57.713440Z",
            "url": "https://files.pythonhosted.org/packages/f8/01/bd4994e04dbdbfebc91ea1f91e2dc2da19ae1209876cf510e5d173545a7b/welford-remove-0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-23 03:10:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "18goldr",
    "github_project": "welford-with-remove",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "welford-remove"
}
        
Elapsed time: 0.26763s