fastnanquantile


Namefastnanquantile JSON
Version 0.0.1 PyPI version JSON
download
home_page
SummaryA package for fast nanquantile calculation
upload_time2024-01-08 23:12:04
maintainer
docs_urlNone
author
requires_python>=3.8
license
keywords numpy bottleneck xarray
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
![Tests](https://github.com/lbferreira/fastnanquantile/actions/workflows/tests.yml/badge.svg)
[![image](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lbferreira/fastnanquantile/blob/main)
# fastnanquantile
An alternative implementation of numpy's nanquantile function. It's faster in many cases, especially for 2D and 3D arrays.
Note that np.quantile is much faster than np.nanquantile, but it doesn't support NaN values. This package is intended to be used when NaN values are present in the data.

## Installation
To install the package, run the command below
```
pip install git+https://github.com/lbferreira/fastnanquantile
```

## Usage
The function was designed to be very similar to numpy's nanquantile function. 
Example:
```python
import numpy as np
import fastnanquantile as fnq

sample_data = np.random.random((50, 100, 100))
start = time.time()
np_result = np.nanquantile(sample_data, q=0.6, axis=0)
print(f'Time for np.nanquantile: {time.time() - start}s')
# Printed: Time for np.nanquantile: 0.2658s
start = time.time()
fnq_result = fnq.nanquantile(sample_data, q=0.6, axis=0)
print(f'Time for fnq.nanquantile: {time.time() - start}s')
# Printed: Time for fnq.nanquantile: 0.0099s
# Disclaimer: The time for the first call to fnq.nanquantile is slower than
# the following calls, due to the compilation time of the function.
```

## Xarray compatible function
Xarray is a powerful library for working with multidimensional arrays. It can be used to compute quantiles along a given dimension of a DataArray. Numpy's nanquantile function is used under the hood. To extend the use of fastnanquantile to xarray, a funtion is provided to compute quantiles for a DataArray, with a similiar behavior of xarray's quantile implementation.
Example:
```python
import numpy as np
import xarray as xr
from fastnanquantile import xrcompat

da = xr.DataArray(
    np.random.rand(10, 1000, 1000),
    coords={"time": np.arange(10), "x": np.arange(1000), "y": np.arange(1000)},
)

# Xarray quantile (time to run: ~25s)
result_xr = da.quantile(q=0.6, dim="time")
# fastnanquantile (time to run: <1s)
result_fnq = xrcompat.xr_apply_nanquantile(da, q=0.6, dim="time")
# Check if results are equal (If results are different, an error will be raised)
np.testing.assert_almost_equal(result_fnq.values, result_result_xrfnq.values, decimal=4)
```
A case study using Xarray + Dask to create time composites from satelitte images can be found in this notebook: [examples/example_xarray.ipynb](examples/example_xarray.ipynb).

## Benchmarks
Some benchmarks were made to compare the performance of fastnanquantile with numpy's nanquantile function. More information can be found in this notebook: [examples/example.ipynb](examples/example.ipynb).
![](./docs/benchmarks/benchmark_1d_array.png)
![](./docs/benchmarks/benchmark_2d_array.png)
![](./docs/benchmarks/benchmark_3d_array.png)

**Benchmarks conclusions**

The performance gains offered by the fastnanquantile implementation depends on the shape of the input array.
Based on the benchmark results, we can conclude:
- 1D arrays: numpy is faster.
- 2D arrays: fastnanquantile is faster for arrays with axis with sizes noticeably different from each other (example: (50, 1000)).
- 3D arrays: fastnanquantile is generally faster, especially when the reduction axis is smaller than the other ones. For example, with shape=(50, 1000, 1000) and reduction axis=0, fastnanquantile is a lot faster than numpy.
- Finally, fastnanquantile can be a great alternative in many cases, especially for 2D and 3D arrays, with potential to greatly speedup quantiles computation.

## Acknowledgements
This library was developed as part of
my research work in the [GCER lab](https://www.gcerlab.com/), under supervision of Vitor Martins, at the Mississippi State University (MSU).

This research is funded by USDA NIFA (award #2023-67019-39169), supporting Lucas Ferreira and Vitor Martins at MSU.

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "fastnanquantile",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "numpy,bottleneck,xarray",
    "author": "",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/72/77/fdd1e96af2d201c9b3e8dd692e97513a9997eaaad1d7a6d67bba0f99f5c2/fastnanquantile-0.0.1.tar.gz",
    "platform": null,
    "description": "[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n![Tests](https://github.com/lbferreira/fastnanquantile/actions/workflows/tests.yml/badge.svg)\n[![image](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lbferreira/fastnanquantile/blob/main)\n# fastnanquantile\nAn alternative implementation of numpy's nanquantile function. It's faster in many cases, especially for 2D and 3D arrays.\nNote that np.quantile is much faster than np.nanquantile, but it doesn't support NaN values. This package is intended to be used when NaN values are present in the data.\n\n## Installation\nTo install the package, run the command below\n```\npip install git+https://github.com/lbferreira/fastnanquantile\n```\n\n## Usage\nThe function was designed to be very similar to numpy's nanquantile function. \nExample:\n```python\nimport numpy as np\nimport fastnanquantile as fnq\n\nsample_data = np.random.random((50, 100, 100))\nstart = time.time()\nnp_result = np.nanquantile(sample_data, q=0.6, axis=0)\nprint(f'Time for np.nanquantile: {time.time() - start}s')\n# Printed: Time for np.nanquantile: 0.2658s\nstart = time.time()\nfnq_result = fnq.nanquantile(sample_data, q=0.6, axis=0)\nprint(f'Time for fnq.nanquantile: {time.time() - start}s')\n# Printed: Time for fnq.nanquantile: 0.0099s\n# Disclaimer: The time for the first call to fnq.nanquantile is slower than\n# the following calls, due to the compilation time of the function.\n```\n\n## Xarray compatible function\nXarray is a powerful library for working with multidimensional arrays. It can be used to compute quantiles along a given dimension of a DataArray. Numpy's nanquantile function is used under the hood. To extend the use of fastnanquantile to xarray, a funtion is provided to compute quantiles for a DataArray, with a similiar behavior of xarray's quantile implementation.\nExample:\n```python\nimport numpy as np\nimport xarray as xr\nfrom fastnanquantile import xrcompat\n\nda = xr.DataArray(\n    np.random.rand(10, 1000, 1000),\n    coords={\"time\": np.arange(10), \"x\": np.arange(1000), \"y\": np.arange(1000)},\n)\n\n# Xarray quantile (time to run: ~25s)\nresult_xr = da.quantile(q=0.6, dim=\"time\")\n# fastnanquantile (time to run: <1s)\nresult_fnq = xrcompat.xr_apply_nanquantile(da, q=0.6, dim=\"time\")\n# Check if results are equal (If results are different, an error will be raised)\nnp.testing.assert_almost_equal(result_fnq.values, result_result_xrfnq.values, decimal=4)\n```\nA case study using Xarray + Dask to create time composites from satelitte images can be found in this notebook: [examples/example_xarray.ipynb](examples/example_xarray.ipynb).\n\n## Benchmarks\nSome benchmarks were made to compare the performance of fastnanquantile with numpy's nanquantile function. More information can be found in this notebook: [examples/example.ipynb](examples/example.ipynb).\n![](./docs/benchmarks/benchmark_1d_array.png)\n![](./docs/benchmarks/benchmark_2d_array.png)\n![](./docs/benchmarks/benchmark_3d_array.png)\n\n**Benchmarks conclusions**\n\nThe performance gains offered by the fastnanquantile implementation depends on the shape of the input array.\nBased on the benchmark results, we can conclude:\n- 1D arrays: numpy is faster.\n- 2D arrays: fastnanquantile is faster for arrays with axis with sizes noticeably different from each other (example: (50, 1000)).\n- 3D arrays: fastnanquantile is generally faster, especially when the reduction axis is smaller than the other ones. For example, with shape=(50, 1000, 1000) and reduction axis=0, fastnanquantile is a lot faster than numpy.\n- Finally, fastnanquantile can be a great alternative in many cases, especially for 2D and 3D arrays, with potential to greatly speedup quantiles computation.\n\n## Acknowledgements\nThis library was developed as part of\nmy research work in the [GCER lab](https://www.gcerlab.com/), under supervision of Vitor Martins, at the Mississippi State University (MSU).\n\nThis research is funded by USDA NIFA (award #2023-67019-39169), supporting Lucas Ferreira and Vitor Martins at MSU.\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A package for fast nanquantile calculation",
    "version": "0.0.1",
    "project_urls": null,
    "split_keywords": [
        "numpy",
        "bottleneck",
        "xarray"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b0fab396c8d91746a37512f4a467d83febbe9fbee8cfe64fc4099d03f4b84d16",
                "md5": "a75d3995df77462ddbb44509577b5964",
                "sha256": "87ef1df820778aa3d2fd14251205c37209fe03fb0b8629974b7e28dc01681e72"
            },
            "downloads": -1,
            "filename": "fastnanquantile-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a75d3995df77462ddbb44509577b5964",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 8077,
            "upload_time": "2024-01-08T23:12:03",
            "upload_time_iso_8601": "2024-01-08T23:12:03.244710Z",
            "url": "https://files.pythonhosted.org/packages/b0/fa/b396c8d91746a37512f4a467d83febbe9fbee8cfe64fc4099d03f4b84d16/fastnanquantile-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7277fdd1e96af2d201c9b3e8dd692e97513a9997eaaad1d7a6d67bba0f99f5c2",
                "md5": "886bbe8ba08eb8484d4d2be170252514",
                "sha256": "a94cf83ce3dbc8ed6a96095beb7346f43a84ccd7c055d3c38f7a3fa6c115b626"
            },
            "downloads": -1,
            "filename": "fastnanquantile-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "886bbe8ba08eb8484d4d2be170252514",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 1162860,
            "upload_time": "2024-01-08T23:12:04",
            "upload_time_iso_8601": "2024-01-08T23:12:04.354955Z",
            "url": "https://files.pythonhosted.org/packages/72/77/fdd1e96af2d201c9b3e8dd692e97513a9997eaaad1d7a6d67bba0f99f5c2/fastnanquantile-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-08 23:12:04",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "fastnanquantile"
}
        
Elapsed time: 0.13424s