omoment


Nameomoment JSON
Version 0.1.5 PyPI version JSON
download
home_pageNone
SummaryOMoment package calculates moments of statistical distributions (means, variances, covariance) in online or
upload_time2023-06-16 12:35:49
maintainerNone
docs_urlNone
authorNone
requires_python>=3.7
licenseNone
keywords statistics mean variance distributed estimation efficient additive
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            |pytest-badge| |doc-badge|

..  |pytest-badge| image:: https://github.com/protivinsky/omoment/actions/workflows/pytest.yaml/badge.svg
    :alt: pytest

..  |doc-badge| image:: https://github.com/protivinsky/omoment/actions/workflows/builddoc.yaml/badge.svg
    :alt: doc
    :target: https://protivinsky.github.io/omoment/index.html

OMoment: Efficient online calculation of statistical moments
============================================================

OMoment package calculates moments of statistical distributions (means, variances, covariance) in online or
distributed settings for univariate and bivariate distributions.

- Suitable for large data – works well with numpy and Pandas and in distributed setting.
- Moments calculated from different parts of data can be easily combined or updated for new data (supports addition
  of results).
- Objects are lightweight, calculation is done in numpy if possible.
- Weights for data can be provided.
- Invalid values (NaNs, infinities are omitted by default).

Typical application is calculation of means and variances (or even correlation of two variables) of many chunks of data
(corresponding to different groups or to different parts of the distributed data), the results can be analyzed on level
of the groups or easily combined to get exact moments for the full dataset.

Basic example
-------------

.. code:: python

    from omoment import OMeanVar
    import numpy as np
    import pandas as pd

    rng = np.random.default_rng(12354)
    g = rng.integers(low=0, high=10, size=1000)
    x = g + rng.normal(loc=0, scale=10, size=1000)
    w = rng.exponential(scale=1, size=1000)

    # calculate overall moments
    OMeanVar.compute(x, w)
    # should give: OMeanVar(mean=4.6, var=108, weight=1.08e+03)

    # or calculate moments for every group
    df = pd.DataFrame({'g': g, 'x': x, 'w': w})
    omvs = df.groupby('g').apply(OMeanVar.of_frame, x='x', w='w')

    # and combine group moments to obtain the same overall results
    OMeanVar.combine(omvs)

    # addition is also supported
    omvs.loc[0] + omvs.loc[1]

At the moment, univariate and bivariate distributions are supported. Bivariate distributions allow for fast linear
regression with two variables (and constant) calculation. Even multivariate distributions can be
efficiently processed in a similar fashion, so the support for them might be added in the future. Moments of
multivariate distributions would also allow for linear regression estimation and other statistical methods
(such as PCA or regularized regression) to be calculated in a single pass through large distributed datasets.

Similar packages
----------------

OMoment package aims for fast calculation of weighted distribution moments (mean and variance at the moment),
great compatibility with numpy and pandas and suitability for distributed datasets (composability of results).
I have not found a package that would satisfy this, even though similar packages indeed exist.

RunStats
........

`RunStats
<https://grantjenks.com/docs/runstats/>`_ package calculates several moments of univariate distribution (including skewness and kurtosis)
and a few other statistics (min and max) and the results can be combined together. In addition, it provides Regression
object for bivariate statistics. It does not support weights and the calculation was more than 100x slower in my
testing (admittedly I am not sure if I used cython support correctly).

.. code:: python

    import numpy as np
    from omoment import OMeanVar
    from runstats import Statistics
    import time

    rng = np.random.Generator(np.random.PCG64(12345))
    x = rng.normal(size=1_000_000)

    start = time.time()
    omv = OMeanVar.compute(x)
    end = time.time()
    print(f'{end - start:.3g} seconds')
    # 0.0146 seconds

    start = time.time()
    st = Statistics(x)
    end = time.time()
    print(f'{end - start:.3g} seconds')
    # 2.83 seconds

Gym
...

`OpenAI Gym
<https://github.com/openai/gym>`_ (or newly `Gymnasium
<https://github.com/Farama-Foundation/Gymnasium>`_)
provides similar functionality as a part of its normalization of observations and rewards
(in gym.wrappers.normalize.RunningMeanStd). The functionality is fairly limited as it was developed for a particular
use case, but the calculation is fast, and it is possible to compose the results. It does not support weights though.

Documentation
-------------

- https://protivinsky.github.io/omoment/index.html

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "omoment",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "statistics,mean,variance,distributed,estimation,efficient,additive",
    "author": null,
    "author_email": "Tomas Protivinsky <tomas.protivinsky@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/c6/6a/64b36a0024d03fa349305438092ca2ebae0edef5168bc33c44e0475bd57d/omoment-0.1.5.tar.gz",
    "platform": null,
    "description": "|pytest-badge| |doc-badge|\n\n..  |pytest-badge| image:: https://github.com/protivinsky/omoment/actions/workflows/pytest.yaml/badge.svg\n    :alt: pytest\n\n..  |doc-badge| image:: https://github.com/protivinsky/omoment/actions/workflows/builddoc.yaml/badge.svg\n    :alt: doc\n    :target: https://protivinsky.github.io/omoment/index.html\n\nOMoment: Efficient online calculation of statistical moments\n============================================================\n\nOMoment package calculates moments of statistical distributions (means, variances, covariance) in online or\ndistributed settings for univariate and bivariate distributions.\n\n- Suitable for large data \u2013 works well with numpy and Pandas and in distributed setting.\n- Moments calculated from different parts of data can be easily combined or updated for new data (supports addition\n  of results).\n- Objects are lightweight, calculation is done in numpy if possible.\n- Weights for data can be provided.\n- Invalid values (NaNs, infinities are omitted by default).\n\nTypical application is calculation of means and variances (or even correlation of two variables) of many chunks of data\n(corresponding to different groups or to different parts of the distributed data), the results can be analyzed on level\nof the groups or easily combined to get exact moments for the full dataset.\n\nBasic example\n-------------\n\n.. code:: python\n\n    from omoment import OMeanVar\n    import numpy as np\n    import pandas as pd\n\n    rng = np.random.default_rng(12354)\n    g = rng.integers(low=0, high=10, size=1000)\n    x = g + rng.normal(loc=0, scale=10, size=1000)\n    w = rng.exponential(scale=1, size=1000)\n\n    # calculate overall moments\n    OMeanVar.compute(x, w)\n    # should give: OMeanVar(mean=4.6, var=108, weight=1.08e+03)\n\n    # or calculate moments for every group\n    df = pd.DataFrame({'g': g, 'x': x, 'w': w})\n    omvs = df.groupby('g').apply(OMeanVar.of_frame, x='x', w='w')\n\n    # and combine group moments to obtain the same overall results\n    OMeanVar.combine(omvs)\n\n    # addition is also supported\n    omvs.loc[0] + omvs.loc[1]\n\nAt the moment, univariate and bivariate distributions are supported. Bivariate distributions allow for fast linear\nregression with two variables (and constant) calculation. Even multivariate distributions can be\nefficiently processed in a similar fashion, so the support for them might be added in the future. Moments of\nmultivariate distributions would also allow for linear regression estimation and other statistical methods\n(such as PCA or regularized regression) to be calculated in a single pass through large distributed datasets.\n\nSimilar packages\n----------------\n\nOMoment package aims for fast calculation of weighted distribution moments (mean and variance at the moment),\ngreat compatibility with numpy and pandas and suitability for distributed datasets (composability of results).\nI have not found a package that would satisfy this, even though similar packages indeed exist.\n\nRunStats\n........\n\n`RunStats\n<https://grantjenks.com/docs/runstats/>`_ package calculates several moments of univariate distribution (including skewness and kurtosis)\nand a few other statistics (min and max) and the results can be combined together. In addition, it provides Regression\nobject for bivariate statistics. It does not support weights and the calculation was more than 100x slower in my\ntesting (admittedly I am not sure if I used cython support correctly).\n\n.. code:: python\n\n    import numpy as np\n    from omoment import OMeanVar\n    from runstats import Statistics\n    import time\n\n    rng = np.random.Generator(np.random.PCG64(12345))\n    x = rng.normal(size=1_000_000)\n\n    start = time.time()\n    omv = OMeanVar.compute(x)\n    end = time.time()\n    print(f'{end - start:.3g} seconds')\n    # 0.0146 seconds\n\n    start = time.time()\n    st = Statistics(x)\n    end = time.time()\n    print(f'{end - start:.3g} seconds')\n    # 2.83 seconds\n\nGym\n...\n\n`OpenAI Gym\n<https://github.com/openai/gym>`_ (or newly `Gymnasium\n<https://github.com/Farama-Foundation/Gymnasium>`_)\nprovides similar functionality as a part of its normalization of observations and rewards\n(in gym.wrappers.normalize.RunningMeanStd). The functionality is fairly limited as it was developed for a particular\nuse case, but the calculation is fast, and it is possible to compose the results. It does not support weights though.\n\nDocumentation\n-------------\n\n- https://protivinsky.github.io/omoment/index.html\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "OMoment package calculates moments of statistical distributions (means, variances, covariance) in online or",
    "version": "0.1.5",
    "project_urls": {
        "Documentation": "https://protivinsky.github.io/omoment",
        "Homepage": "https://github.com/protivinsky/omoment"
    },
    "split_keywords": [
        "statistics",
        "mean",
        "variance",
        "distributed",
        "estimation",
        "efficient",
        "additive"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "84ee76663a077e36cf12f455940915156ac93ef199fa0172a4dc8b4d8e833b06",
                "md5": "499cd3e25d81ca5b013eeaa2ab1a23bd",
                "sha256": "db559c6241bb3b6a6cb8d064e01b66d52325a53a6f3c18dac847c810d7f848c4"
            },
            "downloads": -1,
            "filename": "omoment-0.1.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "499cd3e25d81ca5b013eeaa2ab1a23bd",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 18813,
            "upload_time": "2023-06-16T12:35:41",
            "upload_time_iso_8601": "2023-06-16T12:35:41.019677Z",
            "url": "https://files.pythonhosted.org/packages/84/ee/76663a077e36cf12f455940915156ac93ef199fa0172a4dc8b4d8e833b06/omoment-0.1.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c66a64b36a0024d03fa349305438092ca2ebae0edef5168bc33c44e0475bd57d",
                "md5": "a07af0b1c3473abd1b1b186daf5251fa",
                "sha256": "c20e8bf32cb37102d6e09d00b6b05c36bfa4705d1c9353ad86abb382d04dc9ad"
            },
            "downloads": -1,
            "filename": "omoment-0.1.5.tar.gz",
            "has_sig": false,
            "md5_digest": "a07af0b1c3473abd1b1b186daf5251fa",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 21136,
            "upload_time": "2023-06-16T12:35:49",
            "upload_time_iso_8601": "2023-06-16T12:35:49.419744Z",
            "url": "https://files.pythonhosted.org/packages/c6/6a/64b36a0024d03fa349305438092ca2ebae0edef5168bc33c44e0475bd57d/omoment-0.1.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-16 12:35:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "protivinsky",
    "github_project": "omoment",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "test_requirements": [],
    "lcname": "omoment"
}
        
Elapsed time: 0.89910s