eazyml-dq


Nameeazyml-dq JSON
Version 0.0.5 PyPI version JSON
download
home_pagehttps://eazyml.com/
SummaryPython client for data quality
upload_time2024-12-19 10:05:21
maintainerNone
docs_urlNone
authorEazyml
requires_python>=3.8
licenseNone
keywords python
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Eazyml Data Quality
![Python](https://img.shields.io/badge/python-3.7%20%7C%203.8%20%7C%203.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue)  ![PyPI package](https://img.shields.io/badge/pypi%20package-0.0.5-brightgreen) ![Code Style](https://img.shields.io/badge/code%20style-black-black)

## Overview
The **eazyml-dq** is a Python utility designed to evaluate the quality of datasets by performing various checks such as data shape, emptiness, outlier detection, balance, and correlation. It helps users identify potential issues in their datasets and provides detailed feedback to ensure data readiness for downstream processes.

## Features
- **Data Shape Quality**: Validates dataset dimensions and checks if the number of rows is sufficient relative to the number of columns.
- **Data Emptiness Check**: Identifies and reports missing values in the dataset.
- **Outlier Detection**: Detects and removes outliers based on statistical analysis.
- **Data Balance Check**: Analyzes the balance of the dataset and computes a balance score.
- **Correlation Analysis**: Calculates correlations between features and provides alerts for highly correlated features.
- **Summary Alerts**: Consolidates key quality issues into a single summary for quick review.

## Installation

To use the Data Quality Checker, ensure you have Python installed on your system. Then, install the required dependencies:

```bash
pip install eazyml-dq
```

## Usage

### Function: `ez_data_quality`
This function evaluates the quality of the dataset provided and returns a detailed report.

#### Parameters:
- `filename` (str): Path to the CSV file containing the dataset to be analyzed.
- `outcome` (str): The target variable (outcome) to assess data quality against.
- `options` (dict, optional): A dictionary specifying additional configurations for data quality checks.



#### Returns:
A dictionary containing:
- `success` (bool): Indicates if the operation was successful.
- `message` (str): Describes the outcome or error encountered.
- Detailed results for:
  - **Data Shape Quality**
  - **Data Emptiness Quality**
  - **Outliers Quality**
  - **Data Balance Quality**
  - **Correlation Quality**
  - **Summary Alerts**

### Example
```python
from data_quality_checker import ez_data_quality

# Specify the file path for the dataset
file_path = 'path/to/dataset.csv'
outcome = 'outcome_column_name'
options = {
      "data_shape": "yes",
      "data_balance": "yes",
      "data_emptiness": "yes",
      "data_outliers": "yes",
      "remove_outliers": "yes",
      "outcome_correlation": "yes"
      )

# Perform data quality checks
result = ez_data_quality(filename=file_path, outcome = outcome, options = options)

# Access specific quality metrics
if result["success"]:
    print("Data Shape Quality:", result["data_shape_quality"])
    print("Outlier Quality:", result["data_outliers_quality"])
    print("Bad Quality Alerts:", result["data_bad_quality_alerts"])
else:
    print("Error:", result["message"])
```

### Sample Output
#### On Success:
```json
{
    "success": true,
    "message": "Data quality checks completed successfully.",
    "data_shape_quality": {
        "Dataset_dimension": [".."],
        "alert": "true",
        "message": "No of columns in dataset is not adequate because the no of rows in the dataset is less than the no of columns",
        "success": true
    },
    "data_emptiness_quality": {
        "message": "There are no missing values present in the training data that was uploaded. Hence no records were imputed.",
        "success": true
    },
    "data_outliers_quality": {
        "message": "The following data points were removed as outliers.",
        "outliers": {
            "columns": [".."],
            "indices": [".."]
        },
        "success": true
    },
    "data_balance_quality": {
        "data_balance": {
            "data_balance_analysis": {
                "balance_score": "''",
                "data_balance": true,
                "decision_threshold": "..",
                "quality_message": "Uploaded data is balanced because the balance score is greater than given threshold"
            }
        },
        "message": "Data balance has been checked successfully",
        "success": true
    },
    "data_correlation_quality": {
        "data_correlation": {
            "Column X": {
                "Column Y": ".."
            }
        },
        "data_correlation_alert": "true",
        "message": "Correlation has been calculated successfully between all features and all features with outcome",
        "success": true
    },
    "data_bad_quality_alerts": {
        "data_shape_alert": "true",
        "data_balance_alert": "false",
        "data_emptiness_alert": "false",
        "data_outliers_alert": "true",
        "data_correlation_alert": "true"
    }
}
```

#### On Failure:
```json
{
    "success": false,
    "message": "Error message describing the failure."
}
```

## License
This project is licensed under the MIT License. See the `LICENSE` file for details.

## Contribution
Contributions are welcome! Please submit a pull request or open an issue to discuss your ideas.


            

Raw data

            {
    "_id": null,
    "home_page": "https://eazyml.com/",
    "name": "eazyml-dq",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "python",
    "author": "Eazyml",
    "author_email": "admin@ipsoftlabs.com",
    "download_url": "https://files.pythonhosted.org/packages/28/1b/4db108e751c05622054af484d6029b04d5dfddf0eba67aff03027a3a7fbf/eazyml_dq-0.0.5.tar.gz",
    "platform": null,
    "description": "# Eazyml Data Quality\n![Python](https://img.shields.io/badge/python-3.7%20%7C%203.8%20%7C%203.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue)  ![PyPI package](https://img.shields.io/badge/pypi%20package-0.0.5-brightgreen) ![Code Style](https://img.shields.io/badge/code%20style-black-black)\n\n## Overview\nThe **eazyml-dq** is a Python utility designed to evaluate the quality of datasets by performing various checks such as data shape, emptiness, outlier detection, balance, and correlation. It helps users identify potential issues in their datasets and provides detailed feedback to ensure data readiness for downstream processes.\n\n## Features\n- **Data Shape Quality**: Validates dataset dimensions and checks if the number of rows is sufficient relative to the number of columns.\n- **Data Emptiness Check**: Identifies and reports missing values in the dataset.\n- **Outlier Detection**: Detects and removes outliers based on statistical analysis.\n- **Data Balance Check**: Analyzes the balance of the dataset and computes a balance score.\n- **Correlation Analysis**: Calculates correlations between features and provides alerts for highly correlated features.\n- **Summary Alerts**: Consolidates key quality issues into a single summary for quick review.\n\n## Installation\n\nTo use the Data Quality Checker, ensure you have Python installed on your system. Then, install the required dependencies:\n\n```bash\npip install eazyml-dq\n```\n\n## Usage\n\n### Function: `ez_data_quality`\nThis function evaluates the quality of the dataset provided and returns a detailed report.\n\n#### Parameters:\n- `filename` (str): Path to the CSV file containing the dataset to be analyzed.\n- `outcome` (str): The target variable (outcome) to assess data quality against.\n- `options` (dict, optional): A dictionary specifying additional configurations for data quality checks.\n\n\n\n#### Returns:\nA dictionary containing:\n- `success` (bool): Indicates if the operation was successful.\n- `message` (str): Describes the outcome or error encountered.\n- Detailed results for:\n  - **Data Shape Quality**\n  - **Data Emptiness Quality**\n  - **Outliers Quality**\n  - **Data Balance Quality**\n  - **Correlation Quality**\n  - **Summary Alerts**\n\n### Example\n```python\nfrom data_quality_checker import ez_data_quality\n\n# Specify the file path for the dataset\nfile_path = 'path/to/dataset.csv'\noutcome = 'outcome_column_name'\noptions = {\n      \"data_shape\": \"yes\",\n      \"data_balance\": \"yes\",\n      \"data_emptiness\": \"yes\",\n      \"data_outliers\": \"yes\",\n      \"remove_outliers\": \"yes\",\n      \"outcome_correlation\": \"yes\"\n      )\n\n# Perform data quality checks\nresult = ez_data_quality(filename=file_path, outcome = outcome, options = options)\n\n# Access specific quality metrics\nif result[\"success\"]:\n    print(\"Data Shape Quality:\", result[\"data_shape_quality\"])\n    print(\"Outlier Quality:\", result[\"data_outliers_quality\"])\n    print(\"Bad Quality Alerts:\", result[\"data_bad_quality_alerts\"])\nelse:\n    print(\"Error:\", result[\"message\"])\n```\n\n### Sample Output\n#### On Success:\n```json\n{\n    \"success\": true,\n    \"message\": \"Data quality checks completed successfully.\",\n    \"data_shape_quality\": {\n        \"Dataset_dimension\": [\"..\"],\n        \"alert\": \"true\",\n        \"message\": \"No of columns in dataset is not adequate because the no of rows in the dataset is less than the no of columns\",\n        \"success\": true\n    },\n    \"data_emptiness_quality\": {\n        \"message\": \"There are no missing values present in the training data that was uploaded. Hence no records were imputed.\",\n        \"success\": true\n    },\n    \"data_outliers_quality\": {\n        \"message\": \"The following data points were removed as outliers.\",\n        \"outliers\": {\n            \"columns\": [\"..\"],\n            \"indices\": [\"..\"]\n        },\n        \"success\": true\n    },\n    \"data_balance_quality\": {\n        \"data_balance\": {\n            \"data_balance_analysis\": {\n                \"balance_score\": \"''\",\n                \"data_balance\": true,\n                \"decision_threshold\": \"..\",\n                \"quality_message\": \"Uploaded data is balanced because the balance score is greater than given threshold\"\n            }\n        },\n        \"message\": \"Data balance has been checked successfully\",\n        \"success\": true\n    },\n    \"data_correlation_quality\": {\n        \"data_correlation\": {\n            \"Column X\": {\n                \"Column Y\": \"..\"\n            }\n        },\n        \"data_correlation_alert\": \"true\",\n        \"message\": \"Correlation has been calculated successfully between all features and all features with outcome\",\n        \"success\": true\n    },\n    \"data_bad_quality_alerts\": {\n        \"data_shape_alert\": \"true\",\n        \"data_balance_alert\": \"false\",\n        \"data_emptiness_alert\": \"false\",\n        \"data_outliers_alert\": \"true\",\n        \"data_correlation_alert\": \"true\"\n    }\n}\n```\n\n#### On Failure:\n```json\n{\n    \"success\": false,\n    \"message\": \"Error message describing the failure.\"\n}\n```\n\n## License\nThis project is licensed under the MIT License. See the `LICENSE` file for details.\n\n## Contribution\nContributions are welcome! Please submit a pull request or open an issue to discuss your ideas.\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Python client for data quality",
    "version": "0.0.5",
    "project_urls": {
        "Documentation": "https://eazyml-docs.readthedocs.io/en/latest/",
        "Homepage": "https://eazyml.com/"
    },
    "split_keywords": [
        "python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a481c932f758cf524f3567f524abf7eb9a3c89eabcf70430cffe76d552f27f7e",
                "md5": "3b265e01373b4539bb2917f3d015b013",
                "sha256": "2b7ca44f68031b80e71cef4dfe18b8c1ffd26895e2302bba7f7fef89247e2e34"
            },
            "downloads": -1,
            "filename": "eazyml_dq-0.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3b265e01373b4539bb2917f3d015b013",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 8808086,
            "upload_time": "2024-12-19T10:05:14",
            "upload_time_iso_8601": "2024-12-19T10:05:14.060049Z",
            "url": "https://files.pythonhosted.org/packages/a4/81/c932f758cf524f3567f524abf7eb9a3c89eabcf70430cffe76d552f27f7e/eazyml_dq-0.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "281b4db108e751c05622054af484d6029b04d5dfddf0eba67aff03027a3a7fbf",
                "md5": "32168e44cd95179372c7af7129fcdbee",
                "sha256": "ebae2264e439eeac427b05cd2489802fa472bcbed370e4a7e1d35a3321c69a65"
            },
            "downloads": -1,
            "filename": "eazyml_dq-0.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "32168e44cd95179372c7af7129fcdbee",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 8589127,
            "upload_time": "2024-12-19T10:05:21",
            "upload_time_iso_8601": "2024-12-19T10:05:21.151416Z",
            "url": "https://files.pythonhosted.org/packages/28/1b/4db108e751c05622054af484d6029b04d5dfddf0eba67aff03027a3a7fbf/eazyml_dq-0.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-19 10:05:21",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "eazyml-dq"
}
        
Elapsed time: 0.45056s