eazyml-dq


Nameeazyml-dq JSON
Version 0.0.19 PyPI version JSON
download
home_pagehttps://eazyml.com/
Summaryeazyml-dq provides APIs for comprehensive data quality assessment, including bias detection, outlier identification, and data drift analysis.
upload_time2025-01-27 12:39:29
maintainerNone
docs_urlNone
authorEazyml
requires_python>=3.7
licenseNone
keywords data-quality bias-detection outlier-detection data-drift model-drift missing-values correlation-analysis data-imputation data-balance data-quality-tests ml-api
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Eazyml Data Quality
![Python](https://img.shields.io/badge/python-3.7%20%7C%203.8%20%7C%203.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue)  ![PyPI package](https://img.shields.io/badge/pypi%20package-0.0.17-brightgreen) ![Code Style](https://img.shields.io/badge/code%20style-black-black)

![EazyML](https://github.com/EazyML/eazyml-docs/raw/refs/heads/master/EazyML_logo.png)

## Overview
The **eazyml-dq** is a Python utility designed to evaluate the quality of datasets by performing various checks such as data shape, emptiness, outlier detection, balance, and correlation. It helps users identify potential issues in their datasets and provides detailed feedback to ensure data readiness for downstream processes.
It offers APIs for data quality assessment across multiple dimensions, including:

## Features
- **Missing Value Analysis**: Detect and impute missing values.
- **Bias Detection**: Uncover and mitigate bias in datasets.
- **Data Drift and Model Drift Analysis**: Monitor changes in data distributions over time.
- **Data Shape Quality**: Validates dataset dimensions and checks if the number of rows is sufficient relative to the number of columns.
- **Data Emptiness Check**: Identifies and reports missing values in the dataset.
- **Outlier Detection**: Detects and removes outliers based on statistical analysis.
- **Data Balance Check**: Analyzes the balance of the dataset and computes a balance score.
- **Correlation Analysis**: Identify multicollinearity, relationships between features and provides alerts for highly correlated features.
- **Summary Alerts**: Consolidates key quality issues into a single summary for quick review.
With eazyml-dq, you can ensure that your training data is clean, balanced, and ready for machine learning.

## Installation
To use the Data Quality Checker, ensure you have Python installed on your system.
### User installation
The easiest way to install data quality is using pip:
```bash
pip install -U eazyml-dq
```
### Dependencies
Eazyml Augmented Intelligence requires :
- pandas==2.0.3
- scikit-learn==1.3.2
- numpy==1.24.3
- openpyxl
- flask

## Usage

This function evaluates the quality of the dataset provided and returns a detailed report.

```python
from eazyml_dq import ez_init, ez_data_quality
# Replace 'your_license_key' with your actual EazyML license key
ez_init(license_key="your_license_key")

# Specify the file path for the dataset
file_path = 'path/to/dataset.csv'
outcome = 'outcome_column_name'
options = {
      "data_shape": "yes",
      "data_balance": "yes",
      "data_emptiness": "yes",
      "data_outliers": "yes",
      "remove_outliers": "yes",
      "outcome_correlation": "yes"
      )

# Perform data quality checks
result = ez_data_quality(filename=file_path, outcome = outcome, options = options)

# Access specific quality metrics
if result["success"]:
    print("Data Shape Quality:", result["data_shape_quality"])
    print("Outlier Quality:", result["data_outliers_quality"])
    print("Bad Quality Alerts:", result["data_bad_quality_alerts"])
else:
    print("Error:", result["message"])
```
You can find more information in the [documentation](https://eazyml.readthedocs.io/en/latest/packages/eazyml_dq.html).


## Useful links and similar projects
- [Documentation](https://docs.eazyml.com)
- [Homepage](https://eazyml.com)
- If you have more questions or want to discuss a specific use case please book an appointment [here](https://eazyml.com/trust-in-ai)
- Here are some other EazyML's packages :

    - [eazyml](https://pypi.org/project/eazyml/): Eazyml provides a suite of APIs for training, testing and optimizing machine learning models with built-in AutoML capabilities, hyperparameter tuning, and cross-validation.
    - [eazyml-dq](https://pypi.org/project/eazyml-dq/): `eazyml-dq` provides APIs for comprehensive data quality assessment, including bias detection, outlier identification, and data drift analysis.
    - [eazyml-cf](https://pypi.org/project/eazyml-cf/): `eazyml-cf` provides APIs for counterfactual explanations, prescriptive analytics, and actionable insights to optimize predictive outcomes.
    - [eazyml-augi](https://pypi.org/project/eazyml-augi/): `eazyml-augi` provides APIs to uncover patterns, generate insights, and discover rules from training datasets.
    - [eazyml-xai](https://pypi.org/project/eazyml-xai/): `eazyml-xai` provides APIs for explainable AI (XAI), offering human-readable explanations, feature importance, and predictive reasoning.
    - [eazyml-xai-image](https://pypi.org/project/eazyml-xai-image/): eazyml-xai-image provides APIs for image explainable AI (XAI).

## License
This project is licensed under the [Proprietary License](https://github.com/EazyML/eazyml-docs/blob/master/LICENSE).

---

*Maintained by [EazyML](https://eazyml.com)*  
*© 2025 EazyML. All rights reserved.*

            

Raw data

            {
    "_id": null,
    "home_page": "https://eazyml.com/",
    "name": "eazyml-dq",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "data-quality, bias-detection, outlier-detection, data-drift, model-drift, missing-values, correlation-analysis, data-imputation, data-balance, data-quality-tests, ml-api",
    "author": "Eazyml",
    "author_email": "admin@ipsoftlabs.com",
    "download_url": "https://files.pythonhosted.org/packages/ad/f2/3dbb9d5517af306e6751d69352de105bbe834232fddea7be037ee66bbbff/eazyml-dq-0.0.19.tar.gz",
    "platform": null,
    "description": "# Eazyml Data Quality\n![Python](https://img.shields.io/badge/python-3.7%20%7C%203.8%20%7C%203.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue)  ![PyPI package](https://img.shields.io/badge/pypi%20package-0.0.17-brightgreen) ![Code Style](https://img.shields.io/badge/code%20style-black-black)\n\n![EazyML](https://github.com/EazyML/eazyml-docs/raw/refs/heads/master/EazyML_logo.png)\n\n## Overview\nThe **eazyml-dq** is a Python utility designed to evaluate the quality of datasets by performing various checks such as data shape, emptiness, outlier detection, balance, and correlation. It helps users identify potential issues in their datasets and provides detailed feedback to ensure data readiness for downstream processes.\nIt offers APIs for data quality assessment across multiple dimensions, including:\n\n## Features\n- **Missing Value Analysis**: Detect and impute missing values.\n- **Bias Detection**: Uncover and mitigate bias in datasets.\n- **Data Drift and Model Drift Analysis**: Monitor changes in data distributions over time.\n- **Data Shape Quality**: Validates dataset dimensions and checks if the number of rows is sufficient relative to the number of columns.\n- **Data Emptiness Check**: Identifies and reports missing values in the dataset.\n- **Outlier Detection**: Detects and removes outliers based on statistical analysis.\n- **Data Balance Check**: Analyzes the balance of the dataset and computes a balance score.\n- **Correlation Analysis**: Identify multicollinearity, relationships between features and provides alerts for highly correlated features.\n- **Summary Alerts**: Consolidates key quality issues into a single summary for quick review.\nWith eazyml-dq, you can ensure that your training data is clean, balanced, and ready for machine learning.\n\n## Installation\nTo use the Data Quality Checker, ensure you have Python installed on your system.\n### User installation\nThe easiest way to install data quality is using pip:\n```bash\npip install -U eazyml-dq\n```\n### Dependencies\nEazyml Augmented Intelligence requires :\n- pandas==2.0.3\n- scikit-learn==1.3.2\n- numpy==1.24.3\n- openpyxl\n- flask\n\n## Usage\n\nThis function evaluates the quality of the dataset provided and returns a detailed report.\n\n```python\nfrom eazyml_dq import ez_init, ez_data_quality\n# Replace 'your_license_key' with your actual EazyML license key\nez_init(license_key=\"your_license_key\")\n\n# Specify the file path for the dataset\nfile_path = 'path/to/dataset.csv'\noutcome = 'outcome_column_name'\noptions = {\n      \"data_shape\": \"yes\",\n      \"data_balance\": \"yes\",\n      \"data_emptiness\": \"yes\",\n      \"data_outliers\": \"yes\",\n      \"remove_outliers\": \"yes\",\n      \"outcome_correlation\": \"yes\"\n      )\n\n# Perform data quality checks\nresult = ez_data_quality(filename=file_path, outcome = outcome, options = options)\n\n# Access specific quality metrics\nif result[\"success\"]:\n    print(\"Data Shape Quality:\", result[\"data_shape_quality\"])\n    print(\"Outlier Quality:\", result[\"data_outliers_quality\"])\n    print(\"Bad Quality Alerts:\", result[\"data_bad_quality_alerts\"])\nelse:\n    print(\"Error:\", result[\"message\"])\n```\nYou can find more information in the [documentation](https://eazyml.readthedocs.io/en/latest/packages/eazyml_dq.html).\n\n\n## Useful links and similar projects\n- [Documentation](https://docs.eazyml.com)\n- [Homepage](https://eazyml.com)\n- If you have more questions or want to discuss a specific use case please book an appointment [here](https://eazyml.com/trust-in-ai)\n- Here are some other EazyML's packages :\n\n    - [eazyml](https://pypi.org/project/eazyml/): Eazyml provides a suite of APIs for training, testing and optimizing machine learning models with built-in AutoML capabilities, hyperparameter tuning, and cross-validation.\n    - [eazyml-dq](https://pypi.org/project/eazyml-dq/): `eazyml-dq` provides APIs for comprehensive data quality assessment, including bias detection, outlier identification, and data drift analysis.\n    - [eazyml-cf](https://pypi.org/project/eazyml-cf/): `eazyml-cf` provides APIs for counterfactual explanations, prescriptive analytics, and actionable insights to optimize predictive outcomes.\n    - [eazyml-augi](https://pypi.org/project/eazyml-augi/): `eazyml-augi` provides APIs to uncover patterns, generate insights, and discover rules from training datasets.\n    - [eazyml-xai](https://pypi.org/project/eazyml-xai/): `eazyml-xai` provides APIs for explainable AI (XAI), offering human-readable explanations, feature importance, and predictive reasoning.\n    - [eazyml-xai-image](https://pypi.org/project/eazyml-xai-image/): eazyml-xai-image provides APIs for image explainable AI (XAI).\n\n## License\nThis project is licensed under the [Proprietary License](https://github.com/EazyML/eazyml-docs/blob/master/LICENSE).\n\n---\n\n*Maintained by [EazyML](https://eazyml.com)*  \n*\u00c2\u00a9 2025 EazyML. All rights reserved.*\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "eazyml-dq provides APIs for comprehensive data quality assessment, including bias detection, outlier identification, and data drift analysis.",
    "version": "0.0.19",
    "project_urls": {
        "Contact Us": "https://eazyml.com/trust-in-ai",
        "Documentation": "https://docs.eazyml.com/",
        "Homepage": "https://eazyml.com/",
        "eazyml": "https://pypi.org/project/eazyml/",
        "eazyml-augi": "https://pypi.org/project/eazyml-augi/",
        "eazyml-cf": "https://pypi.org/project/eazyml-cf/",
        "eazyml-dq": "https://pypi.org/project/eazyml-dq/",
        "eazyml-xai": "https://pypi.org/project/eazyml-xai/",
        "eazyml-xai-image": "https://pypi.org/project/eazyml-xai-image/"
    },
    "split_keywords": [
        "data-quality",
        " bias-detection",
        " outlier-detection",
        " data-drift",
        " model-drift",
        " missing-values",
        " correlation-analysis",
        " data-imputation",
        " data-balance",
        " data-quality-tests",
        " ml-api"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d84d91b269693a2e5a35456390cc98896be237196d593e3e741b232bd54e4ca3",
                "md5": "fcafa4021172458c055d030d52e7ffed",
                "sha256": "fd83504ce2042a4a946e80afac126deaebb3fc7dc5c9508a3929fed87d69d214"
            },
            "downloads": -1,
            "filename": "eazyml_dq-0.0.19-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fcafa4021172458c055d030d52e7ffed",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.7",
            "size": 22495230,
            "upload_time": "2025-01-27T12:39:17",
            "upload_time_iso_8601": "2025-01-27T12:39:17.713009Z",
            "url": "https://files.pythonhosted.org/packages/d8/4d/91b269693a2e5a35456390cc98896be237196d593e3e741b232bd54e4ca3/eazyml_dq-0.0.19-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "adf23dbb9d5517af306e6751d69352de105bbe834232fddea7be037ee66bbbff",
                "md5": "528e230220d7a4648dbbc80492d9393b",
                "sha256": "cf9ad52d98021afe48c2d026bb61d4fc3a27312f182c91f38f7d5f2356651703"
            },
            "downloads": -1,
            "filename": "eazyml-dq-0.0.19.tar.gz",
            "has_sig": false,
            "md5_digest": "528e230220d7a4648dbbc80492d9393b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 21864053,
            "upload_time": "2025-01-27T12:39:29",
            "upload_time_iso_8601": "2025-01-27T12:39:29.812990Z",
            "url": "https://files.pythonhosted.org/packages/ad/f2/3dbb9d5517af306e6751d69352de105bbe834232fddea7be037ee66bbbff/eazyml-dq-0.0.19.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-27 12:39:29",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "eazyml-dq"
}
        
Elapsed time: 1.04708s