# OutlierIdentifiers
## In brief
This a Python package for 1D outlier identifier functions.
If follows closely the Wolfram Language (WL) paclet [AAp1], the R package [AAp2], and the Raku package [AAp3].
------
## Installation
From PyPI.org:
```shell
python3 -m pip install OutlierIdentifiers
```
From GitHub:
```shell
python3 -m pip install git+https://github.com/antononcube/Python-packages.git#egg=OutlierIdentifiers\&subdirectory=OutlierIdentifiers
```
------
## Usage examples
Load packages:
```python
import numpy as np
import plotly.graph_objects as go
from OutlierIdentifiers import *
```
Generate a vector with random numbers:
```python
np.random.seed(14)
vec = np.random.normal(loc=10, scale=20, size=30)
print(vec)
```
[ 41.02678223 11.58372049 13.47953057 8.55326868 -30.086588
12.89355626 -20.02337245 14.22218902 -1.16410111 31.6905813
6.27421752 10.2932275 -11.51138939 22.84504148 6.39326577
22.40600507 26.21948669 25.55871733 5.25020644 -27.83824691
-13.44243588 26.72413943 30.18546801 35.86198722 -0.98662331
-9.6342573 28.29345516 27.46140757 10.44222283 9.91712833]
Plot the vector:
```python
# Create a scatter plot with markers
fig = go.Figure(data=go.Scatter(y=vec, mode='markers'))
# Add labels and title
fig.update_layout(title='Vector of Numbers', xaxis_title='Index', yaxis_title='Value', template = "plotly_dark")
# Display the plot
fig.show()
```
Find outlier positions:
```python
outlier_identifier(vec, identifier=hampel_identifier_parameters)
```
array([ True, False, False, False, True, False, True, False, False,
True, False, False, True, False, False, False, False, False,
False, True, True, False, False, True, False, True, False,
False, False, False])
Find outlier values:
```python
outlier_identifier(vec, identifier=hampel_identifier_parameters, value = True)
```
array([ 41.02678223, -30.086588 , -20.02337245, 31.6905813 ,
-11.51138939, -27.83824691, -13.44243588, 35.86198722,
-9.6342573 ])
Find *top* outlier positions and values:
```python
outlier_identifier(vec, identifier = lambda v: top_outliers(hampel_identifier_parameters(v)))
```
array([ True, False, False, False, False, False, False, False, False,
True, False, False, False, False, False, False, False, False,
False, False, False, False, False, True, False, False, False,
False, False, False])
```python
outlier_identifier(vec, identifier = lambda v: top_outliers(hampel_identifier_parameters(v)), value=True)
```
array([41.02678223, 31.6905813 , 35.86198722])
Find *bottom* outlier positions and values (using quartiles-based identifier):
```python
outlier_identifier(vec, identifier = lambda v: bottom_outliers(quartile_identifier_parameters(v)))
```
array([False, False, False, False, True, False, True, False, False,
False, False, False, False, False, False, False, False, False,
False, True, False, False, False, False, False, False, False,
False, False, False])
```python
outlier_identifier(vec, identifier = lambda v: bottom_outliers(quartile_identifier_parameters(v)), value=True)
```
array([-30.086588 , -20.02337245, -27.83824691])
Here is another way to get the outlier values:
```python
vec[pred]
```
array([-30.086588 , -20.02337245, -27.83824691])
The available outlier parameters functions are:
- `hampel_identifier_parameters`
- `splus_quartile_identifier_parameters`
- `quartile_identifier_parameters`
```python
[ f(vec) for f in (hampel_identifier_parameters, splus_quartile_identifier_parameters, quartile_identifier_parameters)]
```
[(-8.796653643076334, 30.822596969354976),
(-37.649981209714, 64.27685968784428),
(-14.46873856125025, 36.49468188752889)]
------
## References
[AA1] Anton Antonov,
["Outlier detection in a list of numbers"](https://mathematicaforprediction.wordpress.com/2013/10/16/outlier-detection-in-a-list-of-numbers/),
(2013),
[MathematicaForPrediction at WordPress](https://mathematicaforprediction.wordpress.com).
[AAp1] Anton Antonov,
[OutlierIdentifiers WL paclet](https://resources.wolframcloud.com/PacletRepository/resources/AntonAntonov/OutlierIdentifiers/),
(2023),
[Wolfram Language Paclet Repository](https://resources.wolframcloud.com/PacletRepository/).
[AAp2] Anton Antonov,
[OutlierIdentifiers R package](https://github.com/antononcube/R-packages/tree/master/OutlierIdentifiers),
(2019),
[R-packages at GitHub/antononcube](https://github.com/antononcube/R-packages).
[AAp3] Anton Antonov,
[OutlierIdentifiers Raku package](https://github.com/antononcube/Raku-Statistics-OutlierIdentifiers),
(2022),
[GitHub/antononcube](https://github.com/antononcube/).
Raw data
{
"_id": null,
"home_page": "https://github.com/antononcube/Python-packages/tree/main/OutlierIdentifiers",
"name": "OutlierIdentifiers",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "outlier identifiers, outlier detection, outlier, outliers, detection, ml, machine learning",
"author": "Anton Antonov",
"author_email": "antononcube@posteo.net",
"download_url": "https://files.pythonhosted.org/packages/a8/e1/9be63c6067e46fff59543015a0e6a9214e07eae311d1f2782c00ad4a7ab4/outlieridentifiers-0.1.1.tar.gz",
"platform": null,
"description": "# OutlierIdentifiers\n\n## In brief\n\nThis a Python package for 1D outlier identifier functions. \nIf follows closely the Wolfram Language (WL) paclet [AAp1], the R package [AAp2], and the Raku package [AAp3].\n\n------\n\n## Installation \n\nFrom PyPI.org:\n\n```shell\npython3 -m pip install OutlierIdentifiers\n```\n\nFrom GitHub:\n\n```shell\npython3 -m pip install git+https://github.com/antononcube/Python-packages.git#egg=OutlierIdentifiers\\&subdirectory=OutlierIdentifiers\n```\n\n------\n\n## Usage examples\n\nLoad packages:\n\n\n```python\nimport numpy as np\nimport plotly.graph_objects as go\n\nfrom OutlierIdentifiers import *\n```\n\nGenerate a vector with random numbers:\n\n\n```python\nnp.random.seed(14)\nvec = np.random.normal(loc=10, scale=20, size=30)\nprint(vec)\n```\n\n [ 41.02678223 11.58372049 13.47953057 8.55326868 -30.086588\n 12.89355626 -20.02337245 14.22218902 -1.16410111 31.6905813\n 6.27421752 10.2932275 -11.51138939 22.84504148 6.39326577\n 22.40600507 26.21948669 25.55871733 5.25020644 -27.83824691\n -13.44243588 26.72413943 30.18546801 35.86198722 -0.98662331\n -9.6342573 28.29345516 27.46140757 10.44222283 9.91712833]\n\n\nPlot the vector:\n\n\n```python\n# Create a scatter plot with markers\nfig = go.Figure(data=go.Scatter(y=vec, mode='markers'))\n\n# Add labels and title\nfig.update_layout(title='Vector of Numbers', xaxis_title='Index', yaxis_title='Value', template = \"plotly_dark\")\n\n# Display the plot\nfig.show()\n\n```\n\n\n\nFind outlier positions:\n\n\n```python\noutlier_identifier(vec, identifier=hampel_identifier_parameters)\n```\n\n\n\n\n array([ True, False, False, False, True, False, True, False, False,\n True, False, False, True, False, False, False, False, False,\n False, True, True, False, False, True, False, True, False,\n False, False, False])\n\n\n\nFind outlier values:\n\n\n```python\noutlier_identifier(vec, identifier=hampel_identifier_parameters, value = True)\n```\n\n\n\n\n array([ 41.02678223, -30.086588 , -20.02337245, 31.6905813 ,\n -11.51138939, -27.83824691, -13.44243588, 35.86198722,\n -9.6342573 ])\n\n\n\nFind *top* outlier positions and values:\n\n\n```python\noutlier_identifier(vec, identifier = lambda v: top_outliers(hampel_identifier_parameters(v)))\n```\n\n\n\n\n array([ True, False, False, False, False, False, False, False, False,\n True, False, False, False, False, False, False, False, False,\n False, False, False, False, False, True, False, False, False,\n False, False, False])\n\n\n\n\n```python\noutlier_identifier(vec, identifier = lambda v: top_outliers(hampel_identifier_parameters(v)), value=True)\n```\n\n\n\n\n array([41.02678223, 31.6905813 , 35.86198722])\n\n\n\nFind *bottom* outlier positions and values (using quartiles-based identifier):\n\n\n```python\noutlier_identifier(vec, identifier = lambda v: bottom_outliers(quartile_identifier_parameters(v)))\n```\n\n\n\n\n array([False, False, False, False, True, False, True, False, False,\n False, False, False, False, False, False, False, False, False,\n False, True, False, False, False, False, False, False, False,\n False, False, False])\n\n\n\n\n```python\noutlier_identifier(vec, identifier = lambda v: bottom_outliers(quartile_identifier_parameters(v)), value=True)\n```\n\n\n\n\n array([-30.086588 , -20.02337245, -27.83824691])\n\n\nHere is another way to get the outlier values:\n\n\n```python\nvec[pred]\n```\n\n\n\n array([-30.086588 , -20.02337245, -27.83824691])\n\n\n\nThe available outlier parameters functions are:\n\n- `hampel_identifier_parameters`\n- `splus_quartile_identifier_parameters`\n- `quartile_identifier_parameters`\n\n\n```python\n[ f(vec) for f in (hampel_identifier_parameters, splus_quartile_identifier_parameters, quartile_identifier_parameters)]\n```\n\n\n\n\n [(-8.796653643076334, 30.822596969354976),\n (-37.649981209714, 64.27685968784428),\n (-14.46873856125025, 36.49468188752889)]\n\n\n\n------\n\n## References \n\n[AA1] Anton Antonov,\n[\"Outlier detection in a list of numbers\"](https://mathematicaforprediction.wordpress.com/2013/10/16/outlier-detection-in-a-list-of-numbers/),\n(2013),\n[MathematicaForPrediction at WordPress](https://mathematicaforprediction.wordpress.com).\n\n[AAp1] Anton Antonov,\n[OutlierIdentifiers WL paclet](https://resources.wolframcloud.com/PacletRepository/resources/AntonAntonov/OutlierIdentifiers/),\n(2023),\n[Wolfram Language Paclet Repository](https://resources.wolframcloud.com/PacletRepository/).\n\n[AAp2] Anton Antonov,\n[OutlierIdentifiers R package](https://github.com/antononcube/R-packages/tree/master/OutlierIdentifiers),\n(2019),\n[R-packages at GitHub/antononcube](https://github.com/antononcube/R-packages).\n\n[AAp3] Anton Antonov,\n[OutlierIdentifiers Raku package](https://github.com/antononcube/Raku-Statistics-OutlierIdentifiers),\n(2022),\n[GitHub/antononcube](https://github.com/antononcube/).\n",
"bugtrack_url": null,
"license": null,
"summary": "Outlier identifiers functions package.",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://github.com/antononcube/Python-packages/tree/main/OutlierIdentifiers"
},
"split_keywords": [
"outlier identifiers",
" outlier detection",
" outlier",
" outliers",
" detection",
" ml",
" machine learning"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "04aff504432edd91b0d8f637b5b1adfb4ca0a33e557012b908eb0fdb76334314",
"md5": "c6ec224d284e98bcea03bc2ad590204c",
"sha256": "bb425b0db0b20b754c592e0b222b982a7e50da1333a381faf2d07d90e877678b"
},
"downloads": -1,
"filename": "OutlierIdentifiers-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c6ec224d284e98bcea03bc2ad590204c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 5060,
"upload_time": "2024-08-21T03:44:28",
"upload_time_iso_8601": "2024-08-21T03:44:28.564369Z",
"url": "https://files.pythonhosted.org/packages/04/af/f504432edd91b0d8f637b5b1adfb4ca0a33e557012b908eb0fdb76334314/OutlierIdentifiers-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a8e19be63c6067e46fff59543015a0e6a9214e07eae311d1f2782c00ad4a7ab4",
"md5": "c7805c9e6aa998f2e155b13e7b9408d3",
"sha256": "8a019686249cb562ab42f606deae6e30aed4b825d977e0fddfc5419297901375"
},
"downloads": -1,
"filename": "outlieridentifiers-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "c7805c9e6aa998f2e155b13e7b9408d3",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 4820,
"upload_time": "2024-08-21T03:44:29",
"upload_time_iso_8601": "2024-08-21T03:44:29.859741Z",
"url": "https://files.pythonhosted.org/packages/a8/e1/9be63c6067e46fff59543015a0e6a9214e07eae311d1f2782c00ad4a7ab4/outlieridentifiers-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-21 03:44:29",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "antononcube",
"github_project": "Python-packages",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "outlieridentifiers"
}