data-quality-kit


Namedata-quality-kit JSON
Version 0.7.0 PyPI version JSON
download
home_pagehttps://github.com/Dante33CTP/data-quality-kit
Summarylibrary of functions for managing and improving data quality in Datasets
upload_time2024-10-08 18:27:46
maintainerNone
docs_urlNone
authorDantePedrozo
requires_pythonNone
licenseApache License 2.0
keywords data quality
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Data-Quality-Kit

## Functional Description
A library of functions for managing and improving data quality in Datasets

## Owner
For any bugs or questions, please reach out to [Dante Pedrozo](mailto:dante.victor.33@gmail.com)

## Branching Methodology
This project follows a Git Flow simplified branching methodology
- **Master Branch**: production code
- **Develop Branch**: main integration branch for ongoing development. Features and fixes are merged into this branch before reaching master
- **Feature Branch**: created from develop branch to work on new features

## Prerequisites
This project uses:
- Language: Python 3.10
- Libraries: 
  - pandas
  - pytest
  - assertpy

## How to use it
Install the library

```bash
pip install data-quality-kit
```
```
from data_quality_quick.validate_formats import check_type_format
```

## Functionalities

- **Completeness**
  - **assert_that_dataframe_is_empty**: Check if a DataFrame is empty.
- **Validity**
  - **assert_that_there_are_not_nulls**: Checks for null values in a specified column of a DataFrame.
- **Consistency**
  - **assert_that_there_are_not_duplicates**: Checks for duplicate values in the specified primary key column of a DataFrame.
  - **assert_that_columns_values_match** :  Check if all values in column2 of df2 are present in column1 of df1.
- **Accuracy**
  - **assert_that_type_value**: Check if all non-null entries in a specified column of a DataFrame are of the specified data type.
  - **assert_that_values_in_catalog**:  Checks whether all values in the specified column of a DataFrame are present
    in a catalog (list of values).


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Dante33CTP/data-quality-kit",
    "name": "data-quality-kit",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "Data Quality",
    "author": "DantePedrozo",
    "author_email": "dante.victor.33@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/9a/79/e6c9d51d0d0a499b03fed654e509449d2917328b93f580db4286207c490e/data_quality_kit-0.7.0.tar.gz",
    "platform": null,
    "description": "# Data-Quality-Kit\n\n## Functional Description\nA library of functions for managing and improving data quality in Datasets\n\n## Owner\nFor any bugs or questions, please reach out to [Dante Pedrozo](mailto:dante.victor.33@gmail.com)\n\n## Branching Methodology\nThis project follows a Git Flow simplified branching methodology\n- **Master Branch**: production code\n- **Develop Branch**: main integration branch for ongoing development. Features and fixes are merged into this branch before reaching master\n- **Feature Branch**: created from develop branch to work on new features\n\n## Prerequisites\nThis project uses:\n- Language: Python 3.10\n- Libraries: \n  - pandas\n  - pytest\n  - assertpy\n\n## How to use it\nInstall the library\n\n```bash\npip install data-quality-kit\n```\n```\nfrom data_quality_quick.validate_formats import check_type_format\n```\n\n## Functionalities\n\n- **Completeness**\n  - **assert_that_dataframe_is_empty**: Check if a DataFrame is empty.\n- **Validity**\n  - **assert_that_there_are_not_nulls**: Checks for null values in a specified column of a DataFrame.\n- **Consistency**\n  - **assert_that_there_are_not_duplicates**: Checks for duplicate values in the specified primary key column of a DataFrame.\n  - **assert_that_columns_values_match** :  Check if all values in column2 of df2 are present in column1 of df1.\n- **Accuracy**\n  - **assert_that_type_value**: Check if all non-null entries in a specified column of a DataFrame are of the specified data type.\n  - **assert_that_values_in_catalog**:  Checks whether all values in the specified column of a DataFrame are present\n    in a catalog (list of values).\n\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "library of functions for managing and improving data quality in Datasets",
    "version": "0.7.0",
    "project_urls": {
        "Homepage": "https://github.com/Dante33CTP/data-quality-kit"
    },
    "split_keywords": [
        "data",
        "quality"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7560790abb382afab93910d406664ed671fbb3587e27929e106fa6e850ba6d89",
                "md5": "493e78cc29979e2f5f7bf19c85244bf6",
                "sha256": "26d60f7448d9dbd382b4edf5fe9a04d03d586cc046f568fee68f7acd2cbb3b93"
            },
            "downloads": -1,
            "filename": "data_quality_kit-0.7.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "493e78cc29979e2f5f7bf19c85244bf6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 16387,
            "upload_time": "2024-10-08T18:27:43",
            "upload_time_iso_8601": "2024-10-08T18:27:43.244176Z",
            "url": "https://files.pythonhosted.org/packages/75/60/790abb382afab93910d406664ed671fbb3587e27929e106fa6e850ba6d89/data_quality_kit-0.7.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9a79e6c9d51d0d0a499b03fed654e509449d2917328b93f580db4286207c490e",
                "md5": "322558ff07421e4bc35ae8e1144aee0f",
                "sha256": "3a493c86a6cd3b8115db336011cc5fc049b6f189504e749fbe1108a9d213aa83"
            },
            "downloads": -1,
            "filename": "data_quality_kit-0.7.0.tar.gz",
            "has_sig": false,
            "md5_digest": "322558ff07421e4bc35ae8e1144aee0f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 8862,
            "upload_time": "2024-10-08T18:27:46",
            "upload_time_iso_8601": "2024-10-08T18:27:46.755988Z",
            "url": "https://files.pythonhosted.org/packages/9a/79/e6c9d51d0d0a499b03fed654e509449d2917328b93f580db4286207c490e/data_quality_kit-0.7.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-08 18:27:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Dante33CTP",
    "github_project": "data-quality-kit",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "data-quality-kit"
}
        
Elapsed time: 0.34370s