outlier-utils


Nameoutlier-utils JSON
Version 0.0.5 PyPI version JSON
download
home_page
SummaryUtility library for detecting and removing outliers from normally distributed datasets
upload_time2023-08-31 16:10:26
maintainer
docs_urlNone
author
requires_python
licenseMIT License
keywords
VCS
bugtrack_url
requirements numpy scipy flake8 black isort pytest pandas
Travis-CI No Travis.
coveralls test coverage No coveralls.
            =============
outlier-utils
=============

Utility library for detecting and removing outliers from normally distributed datasets using the Smirnov-Grubbs_ test.

Requirements
------------

- Python_ (version 3.8 or later)
- SciPy_
- NumPy_

Overview
--------

Both the two-sided and the one-sided version of the test are supported. The former allows extracting outliers from both ends of the dataset, whereas the latter only considers min/max outliers. When running a test, every outlier will be removed until none can be found in the dataset. The output of the test is flexible enough to match several use cases. By default, the outlier-free data will be returned, but the test can also return the outliers themselves or their indices in the original dataset.

Examples
--------

- Two-sided Grubbs test with a Pandas series input

::

   >>> from outliers import smirnov_grubbs as grubbs
   >>> import pandas as pd
   >>> data = pd.Series([1, 8, 9, 10, 9])
   >>> grubbs.test(data, alpha=0.05)
   1     8
   2     9
   3    10
   4     9
   dtype: int64
   
- Two-sided Grubbs test with a NumPy array input   

::

   >>> import numpy as np
   >>> data = np.array([1, 8, 9, 10, 9])
   >>> grubbs.test(data, alpha=0.05)
   array([ 8,  9, 10,  9])
   
- One-sided (min) test returning outlier indices

::

   >>> grubbs.min_test_indices([8, 9, 10, 1, 9], alpha=0.05)
   [3]
   
- One-sided (max) tests returning outliers

::

   >>> grubbs.max_test_outliers([8, 9, 10, 1, 9], alpha=0.05)
   []
   >>> grubbs.max_test_outliers([8, 9, 10, 50, 9], alpha=0.05)
   [50]


.. _Smirnov-Grubbs: https://en.wikipedia.org/wiki/Grubbs%27_test_for_outliers
.. _SciPy: https://www.scipy.org/
.. _NumPy: http://www.numpy.org/
.. _Python: https://www.python.org/


License
=======

This software is licensed under the MIT License.


            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "outlier-utils",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "",
    "author_email": "Masashi Shibata <contact@c-bata.link>",
    "download_url": "https://files.pythonhosted.org/packages/29/3a/73493f0d4ee662798b27b0287d4372d99d3339ba5c3801caa14d5bf4d26d/outlier_utils-0.0.5.tar.gz",
    "platform": null,
    "description": "=============\noutlier-utils\n=============\n\nUtility library for detecting and removing outliers from normally distributed datasets using the Smirnov-Grubbs_ test.\n\nRequirements\n------------\n\n- Python_ (version 3.8 or later)\n- SciPy_\n- NumPy_\n\nOverview\n--------\n\nBoth the two-sided and the one-sided version of the test are supported. The former allows extracting outliers from both ends of the dataset, whereas the latter only considers min/max outliers. When running a test, every outlier will be removed until none can be found in the dataset. The output of the test is flexible enough to match several use cases. By default, the outlier-free data will be returned, but the test can also return the outliers themselves or their indices in the original dataset.\n\nExamples\n--------\n\n- Two-sided Grubbs test with a Pandas series input\n\n::\n\n   >>> from outliers import smirnov_grubbs as grubbs\n   >>> import pandas as pd\n   >>> data = pd.Series([1, 8, 9, 10, 9])\n   >>> grubbs.test(data, alpha=0.05)\n   1     8\n   2     9\n   3    10\n   4     9\n   dtype: int64\n   \n- Two-sided Grubbs test with a NumPy array input   \n\n::\n\n   >>> import numpy as np\n   >>> data = np.array([1, 8, 9, 10, 9])\n   >>> grubbs.test(data, alpha=0.05)\n   array([ 8,  9, 10,  9])\n   \n- One-sided (min) test returning outlier indices\n\n::\n\n   >>> grubbs.min_test_indices([8, 9, 10, 1, 9], alpha=0.05)\n   [3]\n   \n- One-sided (max) tests returning outliers\n\n::\n\n   >>> grubbs.max_test_outliers([8, 9, 10, 1, 9], alpha=0.05)\n   []\n   >>> grubbs.max_test_outliers([8, 9, 10, 50, 9], alpha=0.05)\n   [50]\n\n\n.. _Smirnov-Grubbs: https://en.wikipedia.org/wiki/Grubbs%27_test_for_outliers\n.. _SciPy: https://www.scipy.org/\n.. _NumPy: http://www.numpy.org/\n.. _Python: https://www.python.org/\n\n\nLicense\n=======\n\nThis software is licensed under the MIT License.\n\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "Utility library for detecting and removing outliers from normally distributed datasets",
    "version": "0.0.5",
    "project_urls": {
        "Bug Tracker": "https://github.com/c-bata/outlier-utils/issues",
        "Homepage": "https://github.com/c-bata/outlier-utils",
        "Sources": "https://github.com/c-bata/outlier-utils"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5702281e0d898e50138b4275d8f2840d5b6bd41b276cb697dd56fd56ac91262c",
                "md5": "fc28198aec5a8d9fd722bbd896e3c725",
                "sha256": "2e16148a3fa7b2e16ad0a3b75d8c8920828b5cc11568795782d597d4cfb0b194"
            },
            "downloads": -1,
            "filename": "outlier_utils-0.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fc28198aec5a8d9fd722bbd896e3c725",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 5143,
            "upload_time": "2023-08-31T16:10:25",
            "upload_time_iso_8601": "2023-08-31T16:10:25.767920Z",
            "url": "https://files.pythonhosted.org/packages/57/02/281e0d898e50138b4275d8f2840d5b6bd41b276cb697dd56fd56ac91262c/outlier_utils-0.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "293a73493f0d4ee662798b27b0287d4372d99d3339ba5c3801caa14d5bf4d26d",
                "md5": "7018000d4a64e8ea0b96a0a0d45e130b",
                "sha256": "16e46fa6f7b01fe5518ea73fc15d3de0e30091750c428760bbe7dde2c9590579"
            },
            "downloads": -1,
            "filename": "outlier_utils-0.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "7018000d4a64e8ea0b96a0a0d45e130b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 6201,
            "upload_time": "2023-08-31T16:10:26",
            "upload_time_iso_8601": "2023-08-31T16:10:26.679516Z",
            "url": "https://files.pythonhosted.org/packages/29/3a/73493f0d4ee662798b27b0287d4372d99d3339ba5c3801caa14d5bf4d26d/outlier_utils-0.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-31 16:10:26",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "c-bata",
    "github_project": "outlier-utils",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "scipy",
            "specs": []
        },
        {
            "name": "flake8",
            "specs": []
        },
        {
            "name": "black",
            "specs": []
        },
        {
            "name": "isort",
            "specs": []
        },
        {
            "name": "pytest",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": []
        }
    ],
    "lcname": "outlier-utils"
}
        
Elapsed time: 0.41969s