batchstats


Namebatchstats JSON
Version 0.5.2 PyPI version JSON
download
home_pageNone
SummaryEfficient batch statistics computation library for Python.
upload_time2025-08-13 08:49:22
maintainerNone
docs_urlNone
authorCyril Joly
requires_python>=3.6
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
  <img src="https://raw.githubusercontent.com/CyrilJl/BatchStats/main/docs/source/_static/logo_batchstats.svg" alt="Logo BatchStats" width="200">

[![PyPI Version](https://img.shields.io/pypi/v/batchstats.svg)](https://pypi.org/project/batchstats/)
[![conda Version](https://anaconda.org/conda-forge/batchstats/badges/version.svg)](https://anaconda.org/conda-forge/batchstats)
[![Documentation Status](https://img.shields.io/readthedocs/batchstats?logo=read-the-docs)](https://batchstats.readthedocs.io/en/latest/?badge=latest)
[![Unit tests](https://github.com/CyrilJl/BatchStats/actions/workflows/pytest.yml/badge.svg)](https://github.com/CyrilJl/BatchStats/actions/workflows/pytest.yml)
[![Codacy Badge](https://app.codacy.com/project/badge/Grade/59da873e81d84d9281c58c1a09bc72e9)](https://app.codacy.com/gh/CyrilJl/BatchStats/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)

</div>

``batchstats`` is a Python package for computing statistics on data that arrives in batches. It's perfect for streaming data or datasets too large to fit into memory.

For detailed information, please check out the [full documentation](https://batchstats.readthedocs.io).

## Installation

Install ``batchstats`` using ``pip``:

```console
pip install batchstats
```

Or with `conda`:

```console
conda install -c conda-forge batchstats
```

## Quick Start

Here's how to compute the mean and variance of a dataset in batches:

```python
import numpy as np
from batchstats import BatchMean, BatchVar

# Simulate a data stream
data_stream = (np.random.randn(100, 10) for _ in range(10))

# Initialize the stat objects
batch_mean = BatchMean()
batch_var = BatchVar()

# Process each batch
for batch in data_stream:
    batch_mean.update_batch(batch)
    batch_var.update_batch(batch)

# Get the final result
mean = batch_mean()
variance = batch_var()

print(f"Mean shape: {mean.shape}")
print(f"Variance shape: {variance.shape}")
```

## Advanced Usage

`batchstats` handles n-dimensional `np.ndarray` inputs and allows specifying multiple axes for reduction, just like `numpy`.

```python
import numpy as np
from batchstats import BatchMean

# Create a 3D data stream
data_stream = (np.random.rand(10, 5, 8) for _ in range(5))

# Compute the mean over the last two axes (1 and 2)
batch_mean_3d = BatchMean(axis=(1, 2))

for batch in data_stream:
    batch_mean_3d.update_batch(batch)

mean_3d = batch_mean_3d()

print(f"3D Mean shape: {mean_3d.shape}")
```

## Handling NaN Values

``batchstats`` provides `BatchNan*` classes to handle `NaN` values, similar to `numpy`'s `nan*` functions.

```python
import numpy as np
from batchstats import BatchNanMean

# Create data with NaNs
data = np.random.randn(1000, 5)
data[::10] = np.nan

# Compute the mean, ignoring NaNs
nan_mean = BatchNanMean().update_batch(data)()

print(f"NaN-aware mean shape: {nan_mean.shape}")
```

## Available Statistics

``batchstats`` supports a variety of common statistics:

* `BatchSum` / `BatchNanSum`
* `BatchMean` / `BatchNanMean`
* `BatchMin` / `BatchNanMin`
* `BatchMax` / `BatchNanMax`
* `BatchPeakToPeak` / `BatchNanPeakToPeak`
* `BatchVar`
* `BatchStd`
* `BatchCov`

For more details on each class, see the [API Reference](https://batchstats.readthedocs.io/en/latest/api.html).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "batchstats",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "Cyril Joly",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/84/15/be59e38f81fe562666e14537288aaa92d33cec3b0e9a0fa3f04c11bad193/batchstats-0.5.2.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n  <img src=\"https://raw.githubusercontent.com/CyrilJl/BatchStats/main/docs/source/_static/logo_batchstats.svg\" alt=\"Logo BatchStats\" width=\"200\">\n\n[![PyPI Version](https://img.shields.io/pypi/v/batchstats.svg)](https://pypi.org/project/batchstats/)\n[![conda Version](https://anaconda.org/conda-forge/batchstats/badges/version.svg)](https://anaconda.org/conda-forge/batchstats)\n[![Documentation Status](https://img.shields.io/readthedocs/batchstats?logo=read-the-docs)](https://batchstats.readthedocs.io/en/latest/?badge=latest)\n[![Unit tests](https://github.com/CyrilJl/BatchStats/actions/workflows/pytest.yml/badge.svg)](https://github.com/CyrilJl/BatchStats/actions/workflows/pytest.yml)\n[![Codacy Badge](https://app.codacy.com/project/badge/Grade/59da873e81d84d9281c58c1a09bc72e9)](https://app.codacy.com/gh/CyrilJl/BatchStats/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)\n\n</div>\n\n``batchstats`` is a Python package for computing statistics on data that arrives in batches. It's perfect for streaming data or datasets too large to fit into memory.\n\nFor detailed information, please check out the [full documentation](https://batchstats.readthedocs.io).\n\n## Installation\n\nInstall ``batchstats`` using ``pip``:\n\n```console\npip install batchstats\n```\n\nOr with `conda`:\n\n```console\nconda install -c conda-forge batchstats\n```\n\n## Quick Start\n\nHere's how to compute the mean and variance of a dataset in batches:\n\n```python\nimport numpy as np\nfrom batchstats import BatchMean, BatchVar\n\n# Simulate a data stream\ndata_stream = (np.random.randn(100, 10) for _ in range(10))\n\n# Initialize the stat objects\nbatch_mean = BatchMean()\nbatch_var = BatchVar()\n\n# Process each batch\nfor batch in data_stream:\n    batch_mean.update_batch(batch)\n    batch_var.update_batch(batch)\n\n# Get the final result\nmean = batch_mean()\nvariance = batch_var()\n\nprint(f\"Mean shape: {mean.shape}\")\nprint(f\"Variance shape: {variance.shape}\")\n```\n\n## Advanced Usage\n\n`batchstats` handles n-dimensional `np.ndarray` inputs and allows specifying multiple axes for reduction, just like `numpy`.\n\n```python\nimport numpy as np\nfrom batchstats import BatchMean\n\n# Create a 3D data stream\ndata_stream = (np.random.rand(10, 5, 8) for _ in range(5))\n\n# Compute the mean over the last two axes (1 and 2)\nbatch_mean_3d = BatchMean(axis=(1, 2))\n\nfor batch in data_stream:\n    batch_mean_3d.update_batch(batch)\n\nmean_3d = batch_mean_3d()\n\nprint(f\"3D Mean shape: {mean_3d.shape}\")\n```\n\n## Handling NaN Values\n\n``batchstats`` provides `BatchNan*` classes to handle `NaN` values, similar to `numpy`'s `nan*` functions.\n\n```python\nimport numpy as np\nfrom batchstats import BatchNanMean\n\n# Create data with NaNs\ndata = np.random.randn(1000, 5)\ndata[::10] = np.nan\n\n# Compute the mean, ignoring NaNs\nnan_mean = BatchNanMean().update_batch(data)()\n\nprint(f\"NaN-aware mean shape: {nan_mean.shape}\")\n```\n\n## Available Statistics\n\n``batchstats`` supports a variety of common statistics:\n\n* `BatchSum` / `BatchNanSum`\n* `BatchMean` / `BatchNanMean`\n* `BatchMin` / `BatchNanMin`\n* `BatchMax` / `BatchNanMax`\n* `BatchPeakToPeak` / `BatchNanPeakToPeak`\n* `BatchVar`\n* `BatchStd`\n* `BatchCov`\n\nFor more details on each class, see the [API Reference](https://batchstats.readthedocs.io/en/latest/api.html).\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Efficient batch statistics computation library for Python.",
    "version": "0.5.2",
    "project_urls": {
        "Homepage": "https://github.com/CyrilJl/BatchStats"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d9139537a516e547309f42b047515b5e4b97be07903d6e787c794850c65be864",
                "md5": "f4d653d6a6d265a62badcd4464568ac7",
                "sha256": "9bb0a5c37e0788834ddff15895654f11936ad5a4c951f720817e33024e081df1"
            },
            "downloads": -1,
            "filename": "batchstats-0.5.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f4d653d6a6d265a62badcd4464568ac7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 25541,
            "upload_time": "2025-08-13T08:49:21",
            "upload_time_iso_8601": "2025-08-13T08:49:21.667067Z",
            "url": "https://files.pythonhosted.org/packages/d9/13/9537a516e547309f42b047515b5e4b97be07903d6e787c794850c65be864/batchstats-0.5.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8415be59e38f81fe562666e14537288aaa92d33cec3b0e9a0fa3f04c11bad193",
                "md5": "2b1323491192aee7ede865ca8fb0b132",
                "sha256": "9c8e0cfbc4d8fb5f2c2886243bcffa12164a004faceebf56f212f252806a8756"
            },
            "downloads": -1,
            "filename": "batchstats-0.5.2.tar.gz",
            "has_sig": false,
            "md5_digest": "2b1323491192aee7ede865ca8fb0b132",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 15056,
            "upload_time": "2025-08-13T08:49:22",
            "upload_time_iso_8601": "2025-08-13T08:49:22.736186Z",
            "url": "https://files.pythonhosted.org/packages/84/15/be59e38f81fe562666e14537288aaa92d33cec3b0e9a0fa3f04c11bad193/batchstats-0.5.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-13 08:49:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "CyrilJl",
    "github_project": "BatchStats",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "batchstats"
}
        
Elapsed time: 2.37285s