# Data-Quality-Kit
## Functional Description
A library of functions for managing and improving data quality in Datasets
## Owner
For any bugs or questions, please reach out to [Dante Pedrozo](mailto:dante.victor.33@gmail.com)
## Branching Methodology
This project follows a Git Flow simplified branching methodology
- **Master Branch**: production code
- **Develop Branch**: main integration branch for ongoing development. Features and fixes are merged into this branch before reaching master
- **Feature Branch**: created from develop branch to work on new features
## Prerequisites
This project uses:
- Language: Python 3.10
- Libraries:
- pandas
- pytest
- assertpy
## How to use it
Install the library
```bash
pip install data-quality-kit
```
```
from data_quality_quick.validate_formats import check_type_format
```
## Functionalities
- **Completeness**
- **assert_that_dataframe_is_empty**: Check if a DataFrame is empty.
- **Validity**
- **assert_that_there_are_not_nulls**: Checks for null values in a specified column of a DataFrame.
- **Consistency**
- **assert_that_there_are_not_duplicates**: Checks for duplicate values in the specified primary key column of a DataFrame.
- **assert_that_columns_values_match** : Check if all values in column2 of df2 are present in column1 of df1.
- **Accuracy**
- **assert_that_type_value**: Check if all non-null entries in a specified column of a DataFrame are of the specified data type.
- **assert_that_values_in_catalog**: Checks whether all values in the specified column of a DataFrame are present
in a catalog (list of values).
Raw data
{
"_id": null,
"home_page": "https://github.com/Dante33CTP/data-quality-kit",
"name": "data-quality-kit",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "Data Quality",
"author": "DantePedrozo",
"author_email": "dante.victor.33@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/9a/79/e6c9d51d0d0a499b03fed654e509449d2917328b93f580db4286207c490e/data_quality_kit-0.7.0.tar.gz",
"platform": null,
"description": "# Data-Quality-Kit\n\n## Functional Description\nA library of functions for managing and improving data quality in Datasets\n\n## Owner\nFor any bugs or questions, please reach out to [Dante Pedrozo](mailto:dante.victor.33@gmail.com)\n\n## Branching Methodology\nThis project follows a Git Flow simplified branching methodology\n- **Master Branch**: production code\n- **Develop Branch**: main integration branch for ongoing development. Features and fixes are merged into this branch before reaching master\n- **Feature Branch**: created from develop branch to work on new features\n\n## Prerequisites\nThis project uses:\n- Language: Python 3.10\n- Libraries: \n - pandas\n - pytest\n - assertpy\n\n## How to use it\nInstall the library\n\n```bash\npip install data-quality-kit\n```\n```\nfrom data_quality_quick.validate_formats import check_type_format\n```\n\n## Functionalities\n\n- **Completeness**\n - **assert_that_dataframe_is_empty**: Check if a DataFrame is empty.\n- **Validity**\n - **assert_that_there_are_not_nulls**: Checks for null values in a specified column of a DataFrame.\n- **Consistency**\n - **assert_that_there_are_not_duplicates**: Checks for duplicate values in the specified primary key column of a DataFrame.\n - **assert_that_columns_values_match** : Check if all values in column2 of df2 are present in column1 of df1.\n- **Accuracy**\n - **assert_that_type_value**: Check if all non-null entries in a specified column of a DataFrame are of the specified data type.\n - **assert_that_values_in_catalog**: Checks whether all values in the specified column of a DataFrame are present\n in a catalog (list of values).\n\n",
"bugtrack_url": null,
"license": "Apache License 2.0",
"summary": "library of functions for managing and improving data quality in Datasets",
"version": "0.7.0",
"project_urls": {
"Homepage": "https://github.com/Dante33CTP/data-quality-kit"
},
"split_keywords": [
"data",
"quality"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7560790abb382afab93910d406664ed671fbb3587e27929e106fa6e850ba6d89",
"md5": "493e78cc29979e2f5f7bf19c85244bf6",
"sha256": "26d60f7448d9dbd382b4edf5fe9a04d03d586cc046f568fee68f7acd2cbb3b93"
},
"downloads": -1,
"filename": "data_quality_kit-0.7.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "493e78cc29979e2f5f7bf19c85244bf6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 16387,
"upload_time": "2024-10-08T18:27:43",
"upload_time_iso_8601": "2024-10-08T18:27:43.244176Z",
"url": "https://files.pythonhosted.org/packages/75/60/790abb382afab93910d406664ed671fbb3587e27929e106fa6e850ba6d89/data_quality_kit-0.7.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9a79e6c9d51d0d0a499b03fed654e509449d2917328b93f580db4286207c490e",
"md5": "322558ff07421e4bc35ae8e1144aee0f",
"sha256": "3a493c86a6cd3b8115db336011cc5fc049b6f189504e749fbe1108a9d213aa83"
},
"downloads": -1,
"filename": "data_quality_kit-0.7.0.tar.gz",
"has_sig": false,
"md5_digest": "322558ff07421e4bc35ae8e1144aee0f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 8862,
"upload_time": "2024-10-08T18:27:46",
"upload_time_iso_8601": "2024-10-08T18:27:46.755988Z",
"url": "https://files.pythonhosted.org/packages/9a/79/e6c9d51d0d0a499b03fed654e509449d2917328b93f580db4286207c490e/data_quality_kit-0.7.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-08 18:27:46",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Dante33CTP",
"github_project": "data-quality-kit",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "data-quality-kit"
}