eazyml-data-quality


Nameeazyml-data-quality JSON
Version 0.0.30 PyPI version JSON
download
home_pagehttps://eazyml.com/
Summaryeazyml-data-quality from EazyML family for comprehensive data quality assessment, including bias detection, outlier identification, and data drift analysis.
upload_time2025-02-27 16:13:12
maintainerNone
docs_urlNone
authorEazyML
requires_python>=3.7
licenseNone
keywords data-quality bias-detection outlier-detection data-drift model-drift missing-values correlation-analysis data-imputation data-balance data-quality-tests ml-api
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## EazyML Responsible-AI: Data Quality Assessment
![Python](https://img.shields.io/badge/python-3.7%20%7C%203.8%20%7C%203.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue)  ![PyPI package](https://img.shields.io/badge/pypi%20package-0.0.30-brightgreen) ![Code Style](https://img.shields.io/badge/code%20style-black-black)

![EazyML](https://github.com/EazyML/eazyml-docs/raw/refs/heads/master/EazyML_logo.png)

## Overview
`eazyml-data-quality` is a python utility designed to evaluate the quality of datasets by performing various checks such as data shape, emptiness, outlier detection, balance, and correlation. It helps users identify potential issues in their datasets and provides detailed feedback to ensure data readiness for downstream processes.
It offers APIs for data quality assessment across multiple dimensions, including:

## Features
- **Missing Value Analysis**: Detect and impute missing values.
- **Bias Detection**: Uncover and mitigate bias in datasets.
- **Data Drift and Model Drift Analysis**: Monitor changes in data distributions over time.
- **Data Shape Quality**: Validates dataset dimensions and checks if the number of rows is sufficient relative to the number of columns.
- **Data Emptiness Check**: Identifies and reports missing values in the dataset.
- **Outlier Detection**: Detects and removes outliers based on statistical analysis.
- **Data Balance Check**: Analyzes the balance of the dataset and computes a balance score.
- **Correlation Analysis**: Identify multicollinearity, relationships between features and provides alerts for highly correlated features.
- **Summary Alerts**: Consolidates key quality issues into a single summary for quick review.
With eazyml-data-quality, you can ensure that your training data is clean, balanced, and ready for machine learning.

## Installation
To use the Data Quality Checker, ensure you have Python installed on your system.
### User installation
The easiest way to install data quality is using pip:
```bash
pip install -U eazyml-data-quality
```
### Dependencies
This package requires:
- pandas,
- scikit-learn,
- numpy,
- openpyxl,
flask

## Usage
Here's an example of how you can use the APIs from this package.
```python
from eazyml_data_quality import ez_init, ez_data_quality

# initialize: setup book-keeping, access_key if required 
_ = ez_init()

# Perform data quality checks
response = ez_data_quality(
                train_data(`DataFrame/str`) = 'train_dataframe/train_data_path',
                outcome(`str`) = 'target',
                options(`dict`) = {
                    "data_shape"(`str`): "yes"/"no",
                    "data_balance"(`str`): "yes"/"no",
                    "data_emptiness"(`str`): "yes"/"no",
                    "impute"(`str`): "yes"/"no",
                    "data_outliers"(`str`): "yes"/"no",
                    "remove_outliers"(`str`): "yes"/"no",
                    "outcome_correlation"(`str`): "yes"/"no",
                    "data_drift"(`str`): "yes"/"no",
                    "model_drift"(`str`): "yes"/"no",
                    "test_data"(`DataFrame/str`) = 'test_dataframe/test_data_path',
                    "data_completeness"(`str`): "yes"/"no",
                    "data_correctness"(`str`): "yes"/"no",
            }
        )

# Access specific quality metrics
if response["success"]:
    print("Data Shape Quality:", response["data_shape_quality"])
    print("Outlier Quality:", response["data_outliers_quality"])
    print("Bad Quality Alerts:", response["data_bad_quality_alerts"])
else:
    print("Error:", response["message"])
```
You can find more information in the [documentation](https://eazyml.readthedocs.io/en/latest/packages/eazyml_data_quality.html).


## Useful links, other packages from EazyML family
- [Documentation](https://docs.eazyml.com)
- [Homepage](https://eazyml.com)
- If you have questions or would like to discuss a use case, please contact us [here](https://eazyml.com/trust-in-ai)
- Here are the other packages from EazyML suite:

    - [eazyml-automl](https://pypi.org/project/eazyml-automl/): eazyml-automl provides a suite of APIs for training, optimizing and validating machine learning models with built-in AutoML capabilities, hyperparameter tuning, and cross-validation.
    - [eazyml-data-quality](https://pypi.org/project/eazyml-data-quality/): eazyml-data-quality provides APIs for comprehensive data quality assessment, including bias detection, outlier identification, and drift analysis for both data and models.
    - [eazyml-counterfactual](https://pypi.org/project/eazyml-counterfactual/): eazyml-counterfactual provides APIs for optimal prescriptive analytics, counterfactual explanations, and actionable insights to optimize predictive outcomes to align with your objectives.
    - [eazyml-insight](https://pypi.org/project/eazyml-insight/): eazyml-insight provides APIs to discover patterns, generate insights, and mine rules from your datasets.
    - [eazyml-xai](https://pypi.org/project/eazyml-xai/): eazyml-xai provides APIs for explainable AI (XAI), offering human-readable explanations, feature importance, and predictive reasoning.
    - [eazyml-xai-image](https://pypi.org/project/eazyml-xai-image/): eazyml-xai-image provides APIs for image explainable AI (XAI).

## License
This project is licensed under the [Proprietary License](https://github.com/EazyML/eazyml-docs/blob/master/LICENSE).

---

Maintained by [EazyML](https://eazyml.com)  
© 2025 EazyML. All rights reserved.

            

Raw data

            {
    "_id": null,
    "home_page": "https://eazyml.com/",
    "name": "eazyml-data-quality",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "data-quality, bias-detection, outlier-detection, data-drift, model-drift, missing-values, correlation-analysis, data-imputation, data-balance, data-quality-tests, ml-api",
    "author": "EazyML",
    "author_email": "admin@ipsoftlabs.com",
    "download_url": "https://files.pythonhosted.org/packages/7a/22/070debe4fe853fdee294de71ad32dd53e8c14946c7800288dc583485e4ed/eazyml_data_quality-0.0.30.tar.gz",
    "platform": null,
    "description": "## EazyML Responsible-AI: Data Quality Assessment\r\n![Python](https://img.shields.io/badge/python-3.7%20%7C%203.8%20%7C%203.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue)  ![PyPI package](https://img.shields.io/badge/pypi%20package-0.0.30-brightgreen) ![Code Style](https://img.shields.io/badge/code%20style-black-black)\r\n\r\n![EazyML](https://github.com/EazyML/eazyml-docs/raw/refs/heads/master/EazyML_logo.png)\r\n\r\n## Overview\r\n`eazyml-data-quality` is a python utility designed to evaluate the quality of datasets by performing various checks such as data shape, emptiness, outlier detection, balance, and correlation. It helps users identify potential issues in their datasets and provides detailed feedback to ensure data readiness for downstream processes.\r\nIt offers APIs for data quality assessment across multiple dimensions, including:\r\n\r\n## Features\r\n- **Missing Value Analysis**: Detect and impute missing values.\r\n- **Bias Detection**: Uncover and mitigate bias in datasets.\r\n- **Data Drift and Model Drift Analysis**: Monitor changes in data distributions over time.\r\n- **Data Shape Quality**: Validates dataset dimensions and checks if the number of rows is sufficient relative to the number of columns.\r\n- **Data Emptiness Check**: Identifies and reports missing values in the dataset.\r\n- **Outlier Detection**: Detects and removes outliers based on statistical analysis.\r\n- **Data Balance Check**: Analyzes the balance of the dataset and computes a balance score.\r\n- **Correlation Analysis**: Identify multicollinearity, relationships between features and provides alerts for highly correlated features.\r\n- **Summary Alerts**: Consolidates key quality issues into a single summary for quick review.\r\nWith eazyml-data-quality, you can ensure that your training data is clean, balanced, and ready for machine learning.\r\n\r\n## Installation\r\nTo use the Data Quality Checker, ensure you have Python installed on your system.\r\n### User installation\r\nThe easiest way to install data quality is using pip:\r\n```bash\r\npip install -U eazyml-data-quality\r\n```\r\n### Dependencies\r\nThis package requires:\r\n- pandas,\r\n- scikit-learn,\r\n- numpy,\r\n- openpyxl,\r\nflask\r\n\r\n## Usage\r\nHere's an example of how you can use the APIs from this package.\r\n```python\r\nfrom eazyml_data_quality import ez_init, ez_data_quality\r\n\r\n# initialize: setup book-keeping, access_key if required \r\n_ = ez_init()\r\n\r\n# Perform data quality checks\r\nresponse = ez_data_quality(\r\n                train_data(`DataFrame/str`) = 'train_dataframe/train_data_path',\r\n                outcome(`str`) = 'target',\r\n                options(`dict`) = {\r\n                    \"data_shape\"(`str`): \"yes\"/\"no\",\r\n                    \"data_balance\"(`str`): \"yes\"/\"no\",\r\n                    \"data_emptiness\"(`str`): \"yes\"/\"no\",\r\n                    \"impute\"(`str`): \"yes\"/\"no\",\r\n                    \"data_outliers\"(`str`): \"yes\"/\"no\",\r\n                    \"remove_outliers\"(`str`): \"yes\"/\"no\",\r\n                    \"outcome_correlation\"(`str`): \"yes\"/\"no\",\r\n                    \"data_drift\"(`str`): \"yes\"/\"no\",\r\n                    \"model_drift\"(`str`): \"yes\"/\"no\",\r\n                    \"test_data\"(`DataFrame/str`) = 'test_dataframe/test_data_path',\r\n                    \"data_completeness\"(`str`): \"yes\"/\"no\",\r\n                    \"data_correctness\"(`str`): \"yes\"/\"no\",\r\n            }\r\n        )\r\n\r\n# Access specific quality metrics\r\nif response[\"success\"]:\r\n    print(\"Data Shape Quality:\", response[\"data_shape_quality\"])\r\n    print(\"Outlier Quality:\", response[\"data_outliers_quality\"])\r\n    print(\"Bad Quality Alerts:\", response[\"data_bad_quality_alerts\"])\r\nelse:\r\n    print(\"Error:\", response[\"message\"])\r\n```\r\nYou can find more information in the [documentation](https://eazyml.readthedocs.io/en/latest/packages/eazyml_data_quality.html).\r\n\r\n\r\n## Useful links, other packages from EazyML family\r\n- [Documentation](https://docs.eazyml.com)\r\n- [Homepage](https://eazyml.com)\r\n- If you have questions or would like to discuss a use case, please contact us [here](https://eazyml.com/trust-in-ai)\r\n- Here are the other packages from EazyML suite:\r\n\r\n    - [eazyml-automl](https://pypi.org/project/eazyml-automl/): eazyml-automl provides a suite of APIs for training, optimizing and validating machine learning models with built-in AutoML capabilities, hyperparameter tuning, and cross-validation.\r\n    - [eazyml-data-quality](https://pypi.org/project/eazyml-data-quality/): eazyml-data-quality provides APIs for comprehensive data quality assessment, including bias detection, outlier identification, and drift analysis for both data and models.\r\n    - [eazyml-counterfactual](https://pypi.org/project/eazyml-counterfactual/): eazyml-counterfactual provides APIs for optimal prescriptive analytics, counterfactual explanations, and actionable insights to optimize predictive outcomes to align with your objectives.\r\n    - [eazyml-insight](https://pypi.org/project/eazyml-insight/): eazyml-insight provides APIs to discover patterns, generate insights, and mine rules from your datasets.\r\n    - [eazyml-xai](https://pypi.org/project/eazyml-xai/): eazyml-xai provides APIs for explainable AI (XAI), offering human-readable explanations, feature importance, and predictive reasoning.\r\n    - [eazyml-xai-image](https://pypi.org/project/eazyml-xai-image/): eazyml-xai-image provides APIs for image explainable AI (XAI).\r\n\r\n## License\r\nThis project is licensed under the [Proprietary License](https://github.com/EazyML/eazyml-docs/blob/master/LICENSE).\r\n\r\n---\r\n\r\nMaintained by [EazyML](https://eazyml.com)  \r\n\u00c2\u00a9 2025 EazyML. All rights reserved.\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "eazyml-data-quality from EazyML family for comprehensive data quality assessment, including bias detection, outlier identification, and data drift analysis.",
    "version": "0.0.30",
    "project_urls": {
        "Contact Us": "https://eazyml.com/trust-in-ai",
        "Documentation": "https://docs.eazyml.com/",
        "Homepage": "https://eazyml.com/",
        "eazyml-automl": "https://pypi.org/project/eazyml-automl/",
        "eazyml-counterfactual": "https://pypi.org/project/eazyml-counterfactual/",
        "eazyml-data-quality": "https://pypi.org/project/eazyml-data-quality/",
        "eazyml-insight": "https://pypi.org/project/eazyml-insight/",
        "eazyml-xai": "https://pypi.org/project/eazyml-xai/",
        "eazyml-xai-image": "https://pypi.org/project/eazyml-xai-image/"
    },
    "split_keywords": [
        "data-quality",
        " bias-detection",
        " outlier-detection",
        " data-drift",
        " model-drift",
        " missing-values",
        " correlation-analysis",
        " data-imputation",
        " data-balance",
        " data-quality-tests",
        " ml-api"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a62d921ffa25c9dff2c76468b353cd81f9a43bd6145a0a863bd2820548ea2ee4",
                "md5": "446b394791300da756fdd1b08228658c",
                "sha256": "44e210b6199b15d64a1a4e8fd503bdf6916f847a26ba21466da261feb137e7ca"
            },
            "downloads": -1,
            "filename": "eazyml_data_quality-0.0.30-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "446b394791300da756fdd1b08228658c",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.7",
            "size": 22330016,
            "upload_time": "2025-02-27T16:12:56",
            "upload_time_iso_8601": "2025-02-27T16:12:56.920812Z",
            "url": "https://files.pythonhosted.org/packages/a6/2d/921ffa25c9dff2c76468b353cd81f9a43bd6145a0a863bd2820548ea2ee4/eazyml_data_quality-0.0.30-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7a22070debe4fe853fdee294de71ad32dd53e8c14946c7800288dc583485e4ed",
                "md5": "67517fb8193209cdb1809085d6e5a020",
                "sha256": "b7caf9adf593c838676563fc07eb7f6d813b7aa78692c2d85c338e5271e06111"
            },
            "downloads": -1,
            "filename": "eazyml_data_quality-0.0.30.tar.gz",
            "has_sig": false,
            "md5_digest": "67517fb8193209cdb1809085d6e5a020",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 21666725,
            "upload_time": "2025-02-27T16:13:12",
            "upload_time_iso_8601": "2025-02-27T16:13:12.098385Z",
            "url": "https://files.pythonhosted.org/packages/7a/22/070debe4fe853fdee294de71ad32dd53e8c14946c7800288dc583485e4ed/eazyml_data_quality-0.0.30.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-27 16:13:12",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "eazyml-data-quality"
}
        
Elapsed time: 1.23785s