spark-quality-rules-tools


Namespark-quality-rules-tools JSON
Version 0.9.11 PyPI version JSON
download
home_pagehttps://github.com/jonaqp/spark_quality_rules_tools/
Summaryspark_quality_rules_tools
upload_time2024-06-05 05:37:51
maintainerNone
docs_urlNone
authorJonathan Quiza
requires_pythonNone
licenseNone
keywords spark dq rules hammurabies
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # spark_quality_rules_tools

[![Github License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Updates](https://pyup.io/repos/github/woctezuma/google-colab-transfer/shield.svg)](pyup)
[![Python 3](https://pyup.io/repos/github/woctezuma/google-colab-transfer/python-3-shield.svg)](pyup)
[![Code coverage](https://codecov.io/gh/woctezuma/google-colab-transfer/branch/master/graph/badge.svg)](codecov)

spark_quality_rules_tools is a Python library that implements quality rules in sandbox

## Installation

The code is packaged for PyPI, so that the installation consists in running:


## Usage

wrapper run hammurabies

## Sandbox
## Installation
```sh
!yes| pip uninstall spark-quality-rules-tools
```

```sh
pip install spark-quality-rules-tools --user --upgrade
```

## IMPORTS
```sh
import os
import pyspark
from spark_quality_rules_tools import dq_path_workspace
from spark_quality_rules_tools import dq_download_jar
from spark_quality_rules_tools import dq_spark_session
from spark_quality_rules_tools import dq_extract_parameters
from spark_quality_rules_tools import dq_run_sandbox
from spark_quality_rules_tools import dq_validate_conf
from spark_quality_rules_tools import dq_validate_rules
from spark_quality_rules_tools import show_spark_df
pyspark.sql.dataframe.DataFrame.show2 = show_spark_df
```

## Variables
```sh
project_sda="SDA_37036"
url_conf = "http://artifactory-gdt.central-02.nextgen.igrupobbva/artifactory/gl-datio-spark-libs-maven-local/com/datiobd/cdd-hammurabi/4.0.9/DQ_LOCAL_CONFS/KCOG/KCOG_branch_MRField.conf"
```


## Creating Workspace
```sh
dq_path_workspace(project_sda=project_sda)
```


## Download haas jar
```sh
dq_download_jar(haas_version="4.8.0", force=True)
```


## Spark Session
```sh
spark, sc = dq_spark_session()
```


## Validate Conf
```sh
dq_validate_conf(url_conf=url_conf)
```


## Extract Params
```sh
dq_extract_parameters(url_conf=url_conf)
```


## Json params
```sh
parameter_conf_list = [
 {      
    "ARTIFACTORY_UNIQUE_CACHE": "http://artifactory-gdt.central-02.nextgen.igrupobbva",
    "ODATE_DATE": "2022-11-11",
    "COUNTRY_ID": "PE",
    "SCHEMA_PATH": "t_kcog_branch.output.schema",
    "CUTOFF_DATE": "2022-11-11",
    "SCHEMAS_REPOSITORY": "gl-datio-da-generic-local/schemas/pe/kcog/master/t_kcog_branch/latest/"
 }
]
```


## Run 
```sh
dq_run_sandbox(spark=spark,
               sc=sc,
               parameter_conf_list=parameter_conf_list,
               url_conf=url_conf)
```

               
```sh         
df = spark.read.csv("file:/var/sds/homes/P030772/workspace/data_quality_rules/data_reports/KCOG/KCOG_BRANCH_MRFIELD_202304120046_20221111.csv", 
                    header=True)                 
df.show2(100)
```


## Run 
```sh
dq_validate_rules(url_conf=url_conf)
```


## License

[Apache License 2.0](https://www.dropbox.com/s/8t6xtgk06o3ij61/LICENSE?dl=0).

## New features v1.0

## BugFix

- choco install visualcpp-build-tools

## Reference

- Jonathan Quiza [github](https://github.com/jonaqp).
- Jonathan Quiza [RumiMLSpark](http://rumi-ml.herokuapp.com/).



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jonaqp/spark_quality_rules_tools/",
    "name": "spark-quality-rules-tools",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "spark, dq, rules, hammurabies",
    "author": "Jonathan Quiza",
    "author_email": "jony327@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/3c/fb/3e942846b081b93e9e4eca7d8a7140525fdc1f770de3eff5ffb36cdd6f8c/spark_quality_rules_tools-0.9.11.tar.gz",
    "platform": null,
    "description": "# spark_quality_rules_tools\r\n\r\n[![Github License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\r\n[![Updates](https://pyup.io/repos/github/woctezuma/google-colab-transfer/shield.svg)](pyup)\r\n[![Python 3](https://pyup.io/repos/github/woctezuma/google-colab-transfer/python-3-shield.svg)](pyup)\r\n[![Code coverage](https://codecov.io/gh/woctezuma/google-colab-transfer/branch/master/graph/badge.svg)](codecov)\r\n\r\nspark_quality_rules_tools is a Python library that implements quality rules in sandbox\r\n\r\n## Installation\r\n\r\nThe code is packaged for PyPI, so that the installation consists in running:\r\n\r\n\r\n## Usage\r\n\r\nwrapper run hammurabies\r\n\r\n## Sandbox\r\n## Installation\r\n```sh\r\n!yes| pip uninstall spark-quality-rules-tools\r\n```\r\n\r\n```sh\r\npip install spark-quality-rules-tools --user --upgrade\r\n```\r\n\r\n## IMPORTS\r\n```sh\r\nimport os\r\nimport pyspark\r\nfrom spark_quality_rules_tools import dq_path_workspace\r\nfrom spark_quality_rules_tools import dq_download_jar\r\nfrom spark_quality_rules_tools import dq_spark_session\r\nfrom spark_quality_rules_tools import dq_extract_parameters\r\nfrom spark_quality_rules_tools import dq_run_sandbox\r\nfrom spark_quality_rules_tools import dq_validate_conf\r\nfrom spark_quality_rules_tools import dq_validate_rules\r\nfrom spark_quality_rules_tools import show_spark_df\r\npyspark.sql.dataframe.DataFrame.show2 = show_spark_df\r\n```\r\n\r\n## Variables\r\n```sh\r\nproject_sda=\"SDA_37036\"\r\nurl_conf = \"http://artifactory-gdt.central-02.nextgen.igrupobbva/artifactory/gl-datio-spark-libs-maven-local/com/datiobd/cdd-hammurabi/4.0.9/DQ_LOCAL_CONFS/KCOG/KCOG_branch_MRField.conf\"\r\n```\r\n\r\n\r\n## Creating Workspace\r\n```sh\r\ndq_path_workspace(project_sda=project_sda)\r\n```\r\n\r\n\r\n## Download haas jar\r\n```sh\r\ndq_download_jar(haas_version=\"4.8.0\", force=True)\r\n```\r\n\r\n\r\n## Spark Session\r\n```sh\r\nspark, sc = dq_spark_session()\r\n```\r\n\r\n\r\n## Validate Conf\r\n```sh\r\ndq_validate_conf(url_conf=url_conf)\r\n```\r\n\r\n\r\n## Extract Params\r\n```sh\r\ndq_extract_parameters(url_conf=url_conf)\r\n```\r\n\r\n\r\n## Json params\r\n```sh\r\nparameter_conf_list = [\r\n {      \r\n    \"ARTIFACTORY_UNIQUE_CACHE\": \"http://artifactory-gdt.central-02.nextgen.igrupobbva\",\r\n    \"ODATE_DATE\": \"2022-11-11\",\r\n    \"COUNTRY_ID\": \"PE\",\r\n    \"SCHEMA_PATH\": \"t_kcog_branch.output.schema\",\r\n    \"CUTOFF_DATE\": \"2022-11-11\",\r\n    \"SCHEMAS_REPOSITORY\": \"gl-datio-da-generic-local/schemas/pe/kcog/master/t_kcog_branch/latest/\"\r\n }\r\n]\r\n```\r\n\r\n\r\n## Run \r\n```sh\r\ndq_run_sandbox(spark=spark,\r\n               sc=sc,\r\n               parameter_conf_list=parameter_conf_list,\r\n               url_conf=url_conf)\r\n```\r\n\r\n               \r\n```sh         \r\ndf = spark.read.csv(\"file:/var/sds/homes/P030772/workspace/data_quality_rules/data_reports/KCOG/KCOG_BRANCH_MRFIELD_202304120046_20221111.csv\", \r\n                    header=True)                 \r\ndf.show2(100)\r\n```\r\n\r\n\r\n## Run \r\n```sh\r\ndq_validate_rules(url_conf=url_conf)\r\n```\r\n\r\n\r\n## License\r\n\r\n[Apache License 2.0](https://www.dropbox.com/s/8t6xtgk06o3ij61/LICENSE?dl=0).\r\n\r\n## New features v1.0\r\n\r\n## BugFix\r\n\r\n- choco install visualcpp-build-tools\r\n\r\n## Reference\r\n\r\n- Jonathan Quiza [github](https://github.com/jonaqp).\r\n- Jonathan Quiza [RumiMLSpark](http://rumi-ml.herokuapp.com/).\r\n\r\n\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "spark_quality_rules_tools",
    "version": "0.9.11",
    "project_urls": {
        "Download": "https://github.com/jonaqp/spark_quality_rules_tools/archive/main.zip",
        "Homepage": "https://github.com/jonaqp/spark_quality_rules_tools/"
    },
    "split_keywords": [
        "spark",
        " dq",
        " rules",
        " hammurabies"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d17cee4ee0bfdf86a70be985af9c72f0c3b538a12e11aad5347c94ce58808fda",
                "md5": "8c32052e43d77f1c3e565c112c325a29",
                "sha256": "3db802720142b523a68976df4c688bed5844dec5d4d58ab2766cc60f9e2b6e76"
            },
            "downloads": -1,
            "filename": "spark_quality_rules_tools-0.9.11-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8c32052e43d77f1c3e565c112c325a29",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 17934,
            "upload_time": "2024-06-05T05:37:50",
            "upload_time_iso_8601": "2024-06-05T05:37:50.601042Z",
            "url": "https://files.pythonhosted.org/packages/d1/7c/ee4ee0bfdf86a70be985af9c72f0c3b538a12e11aad5347c94ce58808fda/spark_quality_rules_tools-0.9.11-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3cfb3e942846b081b93e9e4eca7d8a7140525fdc1f770de3eff5ffb36cdd6f8c",
                "md5": "30a018c1d0f75a0ee94367dfb0c1fa77",
                "sha256": "28294c01d380d46a00092bbf58214211a9e1bba355ebd25ff42111e0076046aa"
            },
            "downloads": -1,
            "filename": "spark_quality_rules_tools-0.9.11.tar.gz",
            "has_sig": false,
            "md5_digest": "30a018c1d0f75a0ee94367dfb0c1fa77",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 17124,
            "upload_time": "2024-06-05T05:37:51",
            "upload_time_iso_8601": "2024-06-05T05:37:51.947534Z",
            "url": "https://files.pythonhosted.org/packages/3c/fb/3e942846b081b93e9e4eca7d8a7140525fdc1f770de3eff5ffb36cdd6f8c/spark_quality_rules_tools-0.9.11.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-05 05:37:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jonaqp",
    "github_project": "spark_quality_rules_tools",
    "github_not_found": true,
    "lcname": "spark-quality-rules-tools"
}
        
Elapsed time: 1.34569s