outlier-toolkit


Nameoutlier-toolkit JSON
Version 0.1.3 PyPI version JSON
download
home_pagehttps://github.com/irenebetsy/outlier_library
SummaryA Python library for identifying and handling outliers
upload_time2025-08-30 11:38:28
maintainerNone
docs_urlNone
authorIrene Betsy D
requires_python>=3.7
licenseApache License 2.0
keywords outlier iqr zscore winsorization binning data preprocessing analytics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # 🧰 Outlier Toolkit 🛠️

A **standalone Python library** for detecting, handling, and transforming outliers in numeric and categorical data.  
No external dependencies required.

---

## 📜 License

This project is licensed under the **Apache License 2.0**. See the [LICENSE](./LICENSE) file for more details.

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0)

---


## 📊 Features

### 1. Outlier Detection
- **Z-score Detection**: Identify extreme values based on standard deviation.
- **IQR Detection**: Detect outliers using the interquartile range (Q1, Q3).

### 2. Outlier Handling Techniques
- **Remove Outliers**: Drop outlier values from datasets.
- **Replace Outliers**: Replace outliers with mean, median, or most frequent values.

### 3. Winsorization
- **Standard Winsorization**: Cap extreme values at a fixed percentile.
- **Adaptive Quartiles**: Replace low/high outliers using Q1 and Q3.
- **Adaptive Inliers**: Replace low/high outliers using nearest inlier values (custom method).

### 4. Binning
- **Equal Width Binning**: Divide numeric range into equal-width intervals.
- **Equal Frequency Binning**: Divide data so each bin has approximately the same number of values.
- **Auto Binning (Outlier-based)**: Automatically separate low/high outliers and inliers using IQR.

---

## 🔧 Installation

```bash
pip install outlier-toolkit
```

No external libraries required. Compatible with Python 3.7+.

---

## 🧮 Usage

```

from outlier.i_outlier.Zscore import detect_outliers_zscore
from outlier.i_outlier.IQR import detect_outliers_iqr
from outlier.outlierTech.remove import remove_outliers
from outlier.outlierTech.replace import replace_outliers
from outlier.outlierTech.winsorization.standard import winsorize_standard
from outlier.outlierTech.winsorization.adaptive import winsorize_quartiles
from outlier.outlierTech.winsorization.adaptive import winsorize_inliers
from outlier.outlierGroup.binning import eq_width_bin
from outlier.outlierGroup.binning import eq_freq_bin
from outlier.outlierGroup.binning import custom_binning



# Sample Test data
numeric_data = [1, 2, 85, 95, 65, 75, 53, 67, 87, 89, 93, 1001, 1027, 3018]
categorical_data = ["Male", "Female", "Male", "Male", "Unknown", "Unknown", "Other"]

#Detection
print("=== Zscore Detection ===")
print(detect_outliers_zscore(numeric_data))

print("\n=== IQR Detection ===")
print(detect_outliers_iqr(numeric_data))

#Handling
print("\n=== Remove Outliers ===")
print(remove_outliers(numeric_data, method="IQR"))

print("\n=== Replace Outliers (auto-detect) ===")
print(replace_outliers(numeric_data, method="IQR"))
print(replace_outliers(categorical_data, method="IQR"))

#Winsorization
print("\n=== Winsorization (Standard 5%) ===")
print(winsorize_standard(numeric_data[:]))

print("\n=== Winsorization (Adaptive Quartiles) ===")
print(winsorize_quartiles(numeric_data[:]))

print("\n=== Winsorization (Adaptive Inliers) ===")
print(winsorize_inliers(numeric_data[:]))

#Binning
print("\n=== Binning (Equal Width Binning) ===")
print(eq_width_bin(numeric_data[:]))

print("\n=== Binning (Equal Width Binning) ===")
print(eq_freq_bin(numeric_data[:]))

print("\n=== Binning (Equal Width Binning) ===")
print(custom_binning(numeric_data[:]))
```
---

## 📝 Notes

- Works for numeric and categorical data.
- All functions are standalone and do not require external libraries.
- Custom winsorization allows mapping outliers to nearest inliers for more controlled transformations.

---

## 👩‍💻 Author
**Irene Betsy D** 

---





            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/irenebetsy/outlier_library",
    "name": "outlier-toolkit",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "outlier IQR ZScore Winsorization binning data preprocessing analytics",
    "author": "Irene Betsy D",
    "author_email": "betsydnicholraja@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/0c/6f/b366ddbc280e2114b80c61b15539950ce7ae7b7f5ac71bb0520484f3fcb8/outlier_toolkit-0.1.3.tar.gz",
    "platform": null,
    "description": "# \ud83e\uddf0 Outlier Toolkit \ud83d\udee0\ufe0f\r\n\r\nA **standalone Python library** for detecting, handling, and transforming outliers in numeric and categorical data.  \r\nNo external dependencies required.\r\n\r\n---\r\n\r\n## \ud83d\udcdc License\r\n\r\nThis project is licensed under the **Apache License 2.0**. See the [LICENSE](./LICENSE) file for more details.\r\n\r\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0)\r\n\r\n---\r\n\r\n\r\n## \ud83d\udcca Features\r\n\r\n### 1. Outlier Detection\r\n- **Z-score Detection**: Identify extreme values based on standard deviation.\r\n- **IQR Detection**: Detect outliers using the interquartile range (Q1, Q3).\r\n\r\n### 2. Outlier Handling Techniques\r\n- **Remove Outliers**: Drop outlier values from datasets.\r\n- **Replace Outliers**: Replace outliers with mean, median, or most frequent values.\r\n\r\n### 3. Winsorization\r\n- **Standard Winsorization**: Cap extreme values at a fixed percentile.\r\n- **Adaptive Quartiles**: Replace low/high outliers using Q1 and Q3.\r\n- **Adaptive Inliers**: Replace low/high outliers using nearest inlier values (custom method).\r\n\r\n### 4. Binning\r\n- **Equal Width Binning**: Divide numeric range into equal-width intervals.\r\n- **Equal Frequency Binning**: Divide data so each bin has approximately the same number of values.\r\n- **Auto Binning (Outlier-based)**: Automatically separate low/high outliers and inliers using IQR.\r\n\r\n---\r\n\r\n## \ud83d\udd27 Installation\r\n\r\n```bash\r\npip install outlier-toolkit\r\n```\r\n\r\nNo external libraries required. Compatible with Python 3.7+.\r\n\r\n---\r\n\r\n## \ud83e\uddee Usage\r\n\r\n```\r\n\r\nfrom outlier.i_outlier.Zscore import detect_outliers_zscore\r\nfrom outlier.i_outlier.IQR import detect_outliers_iqr\r\nfrom outlier.outlierTech.remove import remove_outliers\r\nfrom outlier.outlierTech.replace import replace_outliers\r\nfrom outlier.outlierTech.winsorization.standard import winsorize_standard\r\nfrom outlier.outlierTech.winsorization.adaptive import winsorize_quartiles\r\nfrom outlier.outlierTech.winsorization.adaptive import winsorize_inliers\r\nfrom outlier.outlierGroup.binning import eq_width_bin\r\nfrom outlier.outlierGroup.binning import eq_freq_bin\r\nfrom outlier.outlierGroup.binning import custom_binning\r\n\r\n\r\n\r\n# Sample Test data\r\nnumeric_data = [1, 2, 85, 95, 65, 75, 53, 67, 87, 89, 93, 1001, 1027, 3018]\r\ncategorical_data = [\"Male\", \"Female\", \"Male\", \"Male\", \"Unknown\", \"Unknown\", \"Other\"]\r\n\r\n#Detection\r\nprint(\"=== Zscore Detection ===\")\r\nprint(detect_outliers_zscore(numeric_data))\r\n\r\nprint(\"\\n=== IQR Detection ===\")\r\nprint(detect_outliers_iqr(numeric_data))\r\n\r\n#Handling\r\nprint(\"\\n=== Remove Outliers ===\")\r\nprint(remove_outliers(numeric_data, method=\"IQR\"))\r\n\r\nprint(\"\\n=== Replace Outliers (auto-detect) ===\")\r\nprint(replace_outliers(numeric_data, method=\"IQR\"))\r\nprint(replace_outliers(categorical_data, method=\"IQR\"))\r\n\r\n#Winsorization\r\nprint(\"\\n=== Winsorization (Standard 5%) ===\")\r\nprint(winsorize_standard(numeric_data[:]))\r\n\r\nprint(\"\\n=== Winsorization (Adaptive Quartiles) ===\")\r\nprint(winsorize_quartiles(numeric_data[:]))\r\n\r\nprint(\"\\n=== Winsorization (Adaptive Inliers) ===\")\r\nprint(winsorize_inliers(numeric_data[:]))\r\n\r\n#Binning\r\nprint(\"\\n=== Binning (Equal Width Binning) ===\")\r\nprint(eq_width_bin(numeric_data[:]))\r\n\r\nprint(\"\\n=== Binning (Equal Width Binning) ===\")\r\nprint(eq_freq_bin(numeric_data[:]))\r\n\r\nprint(\"\\n=== Binning (Equal Width Binning) ===\")\r\nprint(custom_binning(numeric_data[:]))\r\n```\r\n---\r\n\r\n## \ud83d\udcdd Notes\r\n\r\n- Works for numeric and categorical data.\r\n- All functions are standalone and do not require external libraries.\r\n- Custom winsorization allows mapping outliers to nearest inliers for more controlled transformations.\r\n\r\n---\r\n\r\n## \ud83d\udc69\u200d\ud83d\udcbb Author\r\n**Irene Betsy D** \r\n\r\n---\r\n\r\n\r\n\r\n\r\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "A Python library for identifying and handling outliers",
    "version": "0.1.3",
    "project_urls": {
        "Bug Tracker": "https://github.com/irenebetsy/outlier_library/issues",
        "Documentation": "https://github.com/irenebetsy/outlier_library#readme",
        "Homepage": "https://github.com/irenebetsy/outlier_library",
        "Source Code": "https://github.com/irenebetsy/outlier_library"
    },
    "split_keywords": [
        "outlier",
        "iqr",
        "zscore",
        "winsorization",
        "binning",
        "data",
        "preprocessing",
        "analytics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f71519b9aba6948dc886f574ce35dbe0415d635863cac12725ebb9f834821f03",
                "md5": "8cd85992ba1cd4ba8a1e42d1ce7e9171",
                "sha256": "16be88334b1e0f535295451443ba5c385d679edc68b3855132b5694d797fc8f1"
            },
            "downloads": -1,
            "filename": "outlier_toolkit-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8cd85992ba1cd4ba8a1e42d1ce7e9171",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 9079,
            "upload_time": "2025-08-30T11:38:27",
            "upload_time_iso_8601": "2025-08-30T11:38:27.070190Z",
            "url": "https://files.pythonhosted.org/packages/f7/15/19b9aba6948dc886f574ce35dbe0415d635863cac12725ebb9f834821f03/outlier_toolkit-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0c6fb366ddbc280e2114b80c61b15539950ce7ae7b7f5ac71bb0520484f3fcb8",
                "md5": "a8313e7cf314f1eee8203e3b7311dc2c",
                "sha256": "fc3a8c1a3c5a37024c07b4671f44d24a548bcdeb610c0f8de436b9b04b1ef992"
            },
            "downloads": -1,
            "filename": "outlier_toolkit-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "a8313e7cf314f1eee8203e3b7311dc2c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 6111,
            "upload_time": "2025-08-30T11:38:28",
            "upload_time_iso_8601": "2025-08-30T11:38:28.021303Z",
            "url": "https://files.pythonhosted.org/packages/0c/6f/b366ddbc280e2114b80c61b15539950ce7ae7b7f5ac71bb0520484f3fcb8/outlier_toolkit-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-30 11:38:28",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "irenebetsy",
    "github_project": "outlier_library",
    "github_not_found": true,
    "lcname": "outlier-toolkit"
}
        
Elapsed time: 0.98034s