foostrap


Namefoostrap JSON
Version 1.1.1 PyPI version JSON
download
home_page
SummaryFast Bootstrap statistics using Numba
upload_time2024-03-19 12:57:25
maintainer
docs_urlNone
author
requires_python>=3.9
license
keywords ab testing bootstrap confidence interval
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Foostrap: Fast Bootstrap Resampling

## Overview
Foostrap is a simple Python library for efficient bootstrap resampling and confidence interval estimation.

## Features

- Parallel by default using Numba. Typically at least 4x faster than the current Scipy bootstrap. See benchmark notebook [here](https://github.com/japlete/foostrap/blob/main/benchmark.ipynb).
- Implements the Bias-Corrected and Accelerated (BCa) method for CI estimation. Can also use percentiles.
- Optimized for sparse and binary data. The number of zeros is drawn from a Binomial distribution, instead of resampling them individually.
- Supported statistics:
    - For 1-dimensional data: mean, standard deviation, quantile q
    - For 2-dimensional paired data: ratio of sums, weighted mean and pearson correlation
- Robust: unit tests validate edge cases and results within 2 decimal places from Scipy bootstrap.

## Installation

```
pip install foostrap
```

or optionally, if you also want icc-rt as recommended by Numba:

```
pip install foostrap[iccrt]
```

## Usage

The `foostrap` function can take 1 sample or 2 independent samples for comparison. The comparison is always the difference `statistic(sample 1) - statistic(sample 2)`.

If no statistic is specified, by default the mean is used for 1-D samples, and ratio of sums for 2-D samples.

### Example

```python
import numpy as np
from foostrap import foostrap

# Generate some data
x1 = np.random.normal(size=100)

# Performing bootstrap resampling (1-sample mean)
result = foostrap(x1)

# Displaying the confidence interval tuple
print(result.ci)
```

### Parameters

- `x1` (numpy.ndarray): Primary sample array. If observations are paired, the shape must be a 2-column array.
- `x2` (numpy.ndarray, optional): Second sample to compare against x1. Default is None.
- `statistic` (one of `'mean','std','quantile','ratio','wmean','pearson','auto'`): the statistic to compute over each sample. Default `'auto'` (see above).
- `q` (float): probability for the `'quantile'` statistic. Ignored otherwise. Default 0.5 (the median)
- `boot_samples` (int): Number of bootstrap samples to generate. Default 10 000
- `conf_lvl` (float): Confidence level for the interval estimation. Default is 0.95.
- `alternative` (str): Type of confidence interval. `'two-sided'`: with upper and lower bound, `'less'`: only upper bound, `'greater'`: only lower bound. Default `'two-sided'`.
- `ci_method` (str): Method for CI estimation, `'BCa'` (default) or `'percentile'`.
- `random_state`: (int, numpy Generator or SeedSequence): For reproducibility.
- `parallel` (bool): Whether to use parallel processing. Default True
- `ignore_sparse_below` (float): Threshold under which sparse data is treated as dense, to avoid the overhead of a separate sampling. Default 0.1

### Returns

A data class containing the confidence interval (`ci`) as a tuple and the bootstrap samples (`boot_samples`) as a numpy array.

### Notes
1. The first execution will take a few seconds longer since Numba takes time to compile the functions for the first time. The compiled functions are cached in the `__pycache__` in the library installation directory. You can save the cached functions and reuse them in another machine, as long as it has the same package versions and CPU.
2. Each thread gets a separate random generator, spawned from the user supplied or the default. This means that for the results to be reproducible, the number of CPU cores must remain constant.
3. Only the 1-D statistics have the sparse and binary data optimization, since paired data typically doesn't have zeros in both values of an observation.

### More examples
```python
# Generate some data
x1 = np.random.normal(size=100)
x2 = np.random.normal(size=100)

# Bootstrap median(x1) - median(x2)
result = foostrap(x1, x2, statistic= 'quantile', q= 0.5)

# Displaying the confidence interval tuple
print(result.ci)

# Generate 2-column correlated data
x1 = np.random.normal(size=(100,2))
x1[:,1] += x1[:,0]

# Bootstrap pearson correlation coefficient
result = foostrap(x1, statistic= 'pearson')

# Displaying the confidence interval tuple
print(result.ci)
```

## Contributing

If you need other statistics to be supported, note that the current statistics have optimized functions for the sampling and jackknife method. So any new statistic will also need specialized functions.

## License

Foostrap is released under the MIT License. See the LICENSE file for more details.

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "foostrap",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "",
    "keywords": "ab testing,bootstrap,confidence interval",
    "author": "",
    "author_email": "Jose A Poblete <jose.a.poblete@pm.me>",
    "download_url": "https://files.pythonhosted.org/packages/63/ee/9c2b5deda53ba0991683894d2a361ef5339812c9307251bb82b15e71793b/foostrap-1.1.1.tar.gz",
    "platform": null,
    "description": "# Foostrap: Fast Bootstrap Resampling\n\n## Overview\nFoostrap is a simple Python library for efficient bootstrap resampling and confidence interval estimation.\n\n## Features\n\n- Parallel by default using Numba. Typically at least 4x faster than the current Scipy bootstrap. See benchmark notebook [here](https://github.com/japlete/foostrap/blob/main/benchmark.ipynb).\n- Implements the Bias-Corrected and Accelerated (BCa) method for CI estimation. Can also use percentiles.\n- Optimized for sparse and binary data. The number of zeros is drawn from a Binomial distribution, instead of resampling them individually.\n- Supported statistics:\n    - For 1-dimensional data: mean, standard deviation, quantile q\n    - For 2-dimensional paired data: ratio of sums, weighted mean and pearson correlation\n- Robust: unit tests validate edge cases and results within 2 decimal places from Scipy bootstrap.\n\n## Installation\n\n```\npip install foostrap\n```\n\nor optionally, if you also want icc-rt as recommended by Numba:\n\n```\npip install foostrap[iccrt]\n```\n\n## Usage\n\nThe `foostrap` function can take 1 sample or 2 independent samples for comparison. The comparison is always the difference `statistic(sample 1) - statistic(sample 2)`.\n\nIf no statistic is specified, by default the mean is used for 1-D samples, and ratio of sums for 2-D samples.\n\n### Example\n\n```python\nimport numpy as np\nfrom foostrap import foostrap\n\n# Generate some data\nx1 = np.random.normal(size=100)\n\n# Performing bootstrap resampling (1-sample mean)\nresult = foostrap(x1)\n\n# Displaying the confidence interval tuple\nprint(result.ci)\n```\n\n### Parameters\n\n- `x1` (numpy.ndarray): Primary sample array. If observations are paired, the shape must be a 2-column array.\n- `x2` (numpy.ndarray, optional): Second sample to compare against x1. Default is None.\n- `statistic` (one of `'mean','std','quantile','ratio','wmean','pearson','auto'`): the statistic to compute over each sample. Default `'auto'` (see above).\n- `q` (float): probability for the `'quantile'` statistic. Ignored otherwise. Default 0.5 (the median)\n- `boot_samples` (int): Number of bootstrap samples to generate. Default 10 000\n- `conf_lvl` (float): Confidence level for the interval estimation. Default is 0.95.\n- `alternative` (str): Type of confidence interval. `'two-sided'`: with upper and lower bound, `'less'`: only upper bound, `'greater'`: only lower bound. Default `'two-sided'`.\n- `ci_method` (str): Method for CI estimation, `'BCa'` (default) or `'percentile'`.\n- `random_state`: (int, numpy Generator or SeedSequence): For reproducibility.\n- `parallel` (bool): Whether to use parallel processing. Default True\n- `ignore_sparse_below` (float): Threshold under which sparse data is treated as dense, to avoid the overhead of a separate sampling. Default 0.1\n\n### Returns\n\nA data class containing the confidence interval (`ci`) as a tuple and the bootstrap samples (`boot_samples`) as a numpy array.\n\n### Notes\n1. The first execution will take a few seconds longer since Numba takes time to compile the functions for the first time. The compiled functions are cached in the `__pycache__` in the library installation directory. You can save the cached functions and reuse them in another machine, as long as it has the same package versions and CPU.\n2. Each thread gets a separate random generator, spawned from the user supplied or the default. This means that for the results to be reproducible, the number of CPU cores must remain constant.\n3. Only the 1-D statistics have the sparse and binary data optimization, since paired data typically doesn't have zeros in both values of an observation.\n\n### More examples\n```python\n# Generate some data\nx1 = np.random.normal(size=100)\nx2 = np.random.normal(size=100)\n\n# Bootstrap median(x1) - median(x2)\nresult = foostrap(x1, x2, statistic= 'quantile', q= 0.5)\n\n# Displaying the confidence interval tuple\nprint(result.ci)\n\n# Generate 2-column correlated data\nx1 = np.random.normal(size=(100,2))\nx1[:,1] += x1[:,0]\n\n# Bootstrap pearson correlation coefficient\nresult = foostrap(x1, statistic= 'pearson')\n\n# Displaying the confidence interval tuple\nprint(result.ci)\n```\n\n## Contributing\n\nIf you need other statistics to be supported, note that the current statistics have optimized functions for the sampling and jackknife method. So any new statistic will also need specialized functions.\n\n## License\n\nFoostrap is released under the MIT License. See the LICENSE file for more details.\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Fast Bootstrap statistics using Numba",
    "version": "1.1.1",
    "project_urls": {
        "Repository": "https://github.com/japlete/foostrap"
    },
    "split_keywords": [
        "ab testing",
        "bootstrap",
        "confidence interval"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1b89bf493dca7eea2648a5f6919c8fa169f0c986b4490537265d397d985bcc3a",
                "md5": "c5419b01f11d2b1f544e70433e34eebb",
                "sha256": "3c9fa247e0c21800fba42b5716b15ad5f49b0d45d6300346a1f96b1b0446263e"
            },
            "downloads": -1,
            "filename": "foostrap-1.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c5419b01f11d2b1f544e70433e34eebb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 11237,
            "upload_time": "2024-03-19T12:57:23",
            "upload_time_iso_8601": "2024-03-19T12:57:23.355567Z",
            "url": "https://files.pythonhosted.org/packages/1b/89/bf493dca7eea2648a5f6919c8fa169f0c986b4490537265d397d985bcc3a/foostrap-1.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "63ee9c2b5deda53ba0991683894d2a361ef5339812c9307251bb82b15e71793b",
                "md5": "5901e22e315f8c3b71b312f6c506b1ac",
                "sha256": "1b4fcc1c7c0cc4b79c54bb3e887c939dbcd5091d4cc07d27ff72c99ad65f2952"
            },
            "downloads": -1,
            "filename": "foostrap-1.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "5901e22e315f8c3b71b312f6c506b1ac",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 136490,
            "upload_time": "2024-03-19T12:57:25",
            "upload_time_iso_8601": "2024-03-19T12:57:25.661023Z",
            "url": "https://files.pythonhosted.org/packages/63/ee/9c2b5deda53ba0991683894d2a361ef5339812c9307251bb82b15e71793b/foostrap-1.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-19 12:57:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "japlete",
    "github_project": "foostrap",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "foostrap"
}
        
Elapsed time: 0.21943s