# au: Outlier Detection Toolkit
Filtering outliers to find the golden nuggets that standout from the rest.
To install: ```pip install au```
Outlier detection is a fundamental step in data analysis, particularly relevant in statistics, data mining, and machine learning. This toolkit provides a set of functions and classes in Python for identifying outliers - observations in data that are significantly different from the majority. The toolkit is designed to accommodate various methodologies, ranging from statistical methods to machine learning-based approaches.
## Features
1. **Z-Score Based Outlier Detection**
- Detects outliers by measuring how many standard deviations an element is from the mean.
- Suitable for datasets where the distribution is expected to be Gaussian.
2. **Interquartile Range (IQR) Based Outlier Detection**
- Utilizes the IQR, which is the difference between the 75th and 25th percentile of the data.
- Effective for skewed distributions.
3. **Isolation Forest Based Outlier Detection**
- Implements the Isolation Forest algorithm, a machine learning method for anomaly detection.
- Ideal for high-dimensional datasets.
## Installation
Ensure that you have Python installed on your system. This toolkit requires `numpy` and `scikit-learn`. They can be installed via pip.
```
pip install numpy scikit-learn
```
## Features
1. **Z-Score Based Outlier Detection**
- Detects outliers by measuring how many standard deviations an element is from the mean.
- Suitable for datasets where the distribution is expected to be Gaussian.
2. **Interquartile Range (IQR) Based Outlier Detection**
- Utilizes the IQR, which is the difference between the 75th and 25th percentile of the data.
- Effective for skewed distributions.
3. **Isolation Forest Based Outlier Detection**
- Implements the Isolation Forest algorithm, a machine learning method for anomaly detection.
- Ideal for high-dimensional datasets.
## Installation
Ensure that you have Python installed on your system. This toolkit requires `numpy` and `scikit-learn`. They can be installed via pip:
```
pip install numpy scikit-learn
```
## Usage
1. **Z-Score Based Outlier Detection**
```python
from outlier_detection import detect_outliers_zscore
outliers = detect_outliers_zscore([10, 12, 12, 13, 12, 11, 40])
```
2. **Interquartile Range (IQR) Based Outlier Detection**
```python
from outlier_detection import detect_outliers_iqr
outliers = detect_outliers_iqr([10, 12, 12, 13, 12, 11, 40])
```
3. **Isolation Forest Based Outlier Detection**
```python
from outlier_detection import IsolationForestOutlierDetector
detector = IsolationForestOutlierDetector()
outliers = detector.detect_outliers([10, 12, 12, 13, 12, 11, 40])
```
## Documentation
Each function and class in this toolkit comes with a detailed docstring, explaining its purpose, parameters, return values, and examples.
## Contributing
Contributions to this project are welcome! Please fork the repository and submit a pull request with your changes.
Raw data
{
"_id": null,
"home_page": "https://github.com/thorwhalen/uu/tree/master/au",
"name": "au",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "Thor Whalen",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/77/38/c518dbafb6f9736a1337cf030e142b0cc00af02f1f6939a6e1a6f77975bd/au-0.0.7.tar.gz",
"platform": "any",
"description": "\n# au: Outlier Detection Toolkit\n\nFiltering outliers to find the golden nuggets that standout from the rest.\n\nTo install:\t```pip install au```\n\nOutlier detection is a fundamental step in data analysis, particularly relevant in statistics, data mining, and machine learning. This toolkit provides a set of functions and classes in Python for identifying outliers - observations in data that are significantly different from the majority. The toolkit is designed to accommodate various methodologies, ranging from statistical methods to machine learning-based approaches.\n\n## Features\n\n1. **Z-Score Based Outlier Detection**\n - Detects outliers by measuring how many standard deviations an element is from the mean.\n - Suitable for datasets where the distribution is expected to be Gaussian.\n\n2. **Interquartile Range (IQR) Based Outlier Detection**\n - Utilizes the IQR, which is the difference between the 75th and 25th percentile of the data.\n - Effective for skewed distributions.\n\n3. **Isolation Forest Based Outlier Detection**\n - Implements the Isolation Forest algorithm, a machine learning method for anomaly detection.\n - Ideal for high-dimensional datasets.\n\n## Installation\n\nEnsure that you have Python installed on your system. This toolkit requires `numpy` and `scikit-learn`. They can be installed via pip.\n\n```\npip install numpy scikit-learn\n```\n\n## Features\n\n1. **Z-Score Based Outlier Detection**\n - Detects outliers by measuring how many standard deviations an element is from the mean.\n - Suitable for datasets where the distribution is expected to be Gaussian.\n\n2. **Interquartile Range (IQR) Based Outlier Detection**\n - Utilizes the IQR, which is the difference between the 75th and 25th percentile of the data.\n - Effective for skewed distributions.\n\n3. **Isolation Forest Based Outlier Detection**\n - Implements the Isolation Forest algorithm, a machine learning method for anomaly detection.\n - Ideal for high-dimensional datasets.\n\n## Installation\n\nEnsure that you have Python installed on your system. This toolkit requires `numpy` and `scikit-learn`. They can be installed via pip:\n\n```\npip install numpy scikit-learn\n```\n\n## Usage\n\n1. **Z-Score Based Outlier Detection**\n\n ```python\n from outlier_detection import detect_outliers_zscore\n\n outliers = detect_outliers_zscore([10, 12, 12, 13, 12, 11, 40])\n ```\n\n2. **Interquartile Range (IQR) Based Outlier Detection**\n\n ```python\n from outlier_detection import detect_outliers_iqr\n\n outliers = detect_outliers_iqr([10, 12, 12, 13, 12, 11, 40])\n ```\n\n3. **Isolation Forest Based Outlier Detection**\n\n ```python\n from outlier_detection import IsolationForestOutlierDetector\n\n detector = IsolationForestOutlierDetector()\n outliers = detector.detect_outliers([10, 12, 12, 13, 12, 11, 40])\n ```\n\n## Documentation\n\nEach function and class in this toolkit comes with a detailed docstring, explaining its purpose, parameters, return values, and examples.\n\n\n## Contributing\n\nContributions to this project are welcome! Please fork the repository and submit a pull request with your changes.\n",
"bugtrack_url": null,
"license": "apache-2.0",
"summary": "Filtering outliers",
"version": "0.0.7",
"project_urls": {
"Homepage": "https://github.com/thorwhalen/uu/tree/master/au"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "635b9669bf754d17e6545946c11eee12b9c6eb1d614d2c8eefa27f96749df161",
"md5": "9c7da6231df76d6be8e530eb66e11a42",
"sha256": "9c6a14e8702206c16add0ca72465c66922d2bed6cb041ef4c6640237b6cf8fb1"
},
"downloads": -1,
"filename": "au-0.0.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9c7da6231df76d6be8e530eb66e11a42",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 7985,
"upload_time": "2024-01-19T20:07:07",
"upload_time_iso_8601": "2024-01-19T20:07:07.019126Z",
"url": "https://files.pythonhosted.org/packages/63/5b/9669bf754d17e6545946c11eee12b9c6eb1d614d2c8eefa27f96749df161/au-0.0.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7738c518dbafb6f9736a1337cf030e142b0cc00af02f1f6939a6e1a6f77975bd",
"md5": "df94e476e6dfa91426276125a9fc8482",
"sha256": "bca38d5ca7bdb687fb5d97646bf7d3c5504ac1b3e960c2b73a84c6b3b960a3af"
},
"downloads": -1,
"filename": "au-0.0.7.tar.gz",
"has_sig": false,
"md5_digest": "df94e476e6dfa91426276125a9fc8482",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 7598,
"upload_time": "2024-01-19T20:07:08",
"upload_time_iso_8601": "2024-01-19T20:07:08.531131Z",
"url": "https://files.pythonhosted.org/packages/77/38/c518dbafb6f9736a1337cf030e142b0cc00af02f1f6939a6e1a6f77975bd/au-0.0.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-19 20:07:08",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "thorwhalen",
"github_project": "uu",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "au"
}