# spark_hdfs_tools
[![Github License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Updates](https://pyup.io/repos/github/woctezuma/google-colab-transfer/shield.svg)](pyup)
[![Python 3](https://pyup.io/repos/github/woctezuma/google-colab-transfer/python-3-shield.svg)](pyup)
[![Code coverage](https://codecov.io/gh/woctezuma/google-colab-transfer/branch/master/graph/badge.svg)](codecov)
spark_hdfs_tools is a Python library that implements hdfs filesystem in sandbox
## Installation
The code is packaged for PyPI, so that the installation consists in running:
## Usage
wrapper run hdfs filesystem
## Sandbox
## Installation
```sh
!yes| pip uninstall spark-hdfs-tools
```
```sh
pip install spark-hdfs-tools --user --upgrade
```
## IMPORTS
```sh
import os
import pyspark
from spark_hdfs_tools import dq_path_workspace
from spark_hdfs_tools import dq_download_jar
from spark_hdfs_tools import dq_spark_session
```
## Variables
```sh
project_sda="SDA_37036"
url_conf = "http://artifactory-gdt.central-02.nextgen.igrupobbva/artifactory/gl-datio-spark-libs-maven-local/com/datiobd/cdd-hammurabi/4.0.9/DQ_LOCAL_CONFS/KCOG/KCOG_branch_MRField.conf"
```
## Creating Workspace
```sh
dq_path_workspace(project_sda=project_sda)
```
## Download haas jar
```sh
dq_download_jar(haas_version="4.8.0", force=True)
```
## Spark Session
```sh
spark, sc = dq_spark_session()
```
## Validate Conf
```sh
dq_validate_conf(url_conf=url_conf)
```
## Extract Params
```sh
dq_extract_parameters(url_conf=url_conf)
```
## Run
```sh
dq_run_sandbox(spark=spark,
sc=sc,
parameter_conf_list=parameter_conf_list,
url_conf=url_conf)
```
## License
[Apache License 2.0](https://www.dropbox.com/s/8t6xtgk06o3ij61/LICENSE?dl=0).
## New features v1.0
## BugFix
- choco install visualcpp-build-tools
## Reference
- Jonathan Quiza [github](https://github.com/jonaqp).
- Jonathan Quiza [RumiMLSpark](http://rumi-ml.herokuapp.com/).
Raw data
{
"_id": null,
"home_page": "https://github.com/jonaqp/spark_hdfs_tools/",
"name": "spark-hdfs-tools",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "spark,hdfs,dfs",
"author": "Jonathan Quiza",
"author_email": "jony327@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/75/df/f2b9042b4cef74b65bce333d7f9ee9da0094d0d4f71be1647aaad40cf848/spark_hdfs_tools-0.1.1.tar.gz",
"platform": null,
"description": "# spark_hdfs_tools\r\n\r\n[![Github License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\r\n[![Updates](https://pyup.io/repos/github/woctezuma/google-colab-transfer/shield.svg)](pyup)\r\n[![Python 3](https://pyup.io/repos/github/woctezuma/google-colab-transfer/python-3-shield.svg)](pyup)\r\n[![Code coverage](https://codecov.io/gh/woctezuma/google-colab-transfer/branch/master/graph/badge.svg)](codecov)\r\n\r\nspark_hdfs_tools is a Python library that implements hdfs filesystem in sandbox\r\n\r\n## Installation\r\n\r\nThe code is packaged for PyPI, so that the installation consists in running:\r\n\r\n\r\n## Usage\r\n\r\nwrapper run hdfs filesystem\r\n\r\n## Sandbox\r\n## Installation\r\n```sh\r\n!yes| pip uninstall spark-hdfs-tools\r\n```\r\n\r\n```sh\r\npip install spark-hdfs-tools --user --upgrade\r\n```\r\n\r\n## IMPORTS\r\n```sh\r\nimport os\r\nimport pyspark\r\nfrom spark_hdfs_tools import dq_path_workspace\r\nfrom spark_hdfs_tools import dq_download_jar\r\nfrom spark_hdfs_tools import dq_spark_session\r\n\r\n```\r\n\r\n## Variables\r\n```sh\r\nproject_sda=\"SDA_37036\"\r\nurl_conf = \"http://artifactory-gdt.central-02.nextgen.igrupobbva/artifactory/gl-datio-spark-libs-maven-local/com/datiobd/cdd-hammurabi/4.0.9/DQ_LOCAL_CONFS/KCOG/KCOG_branch_MRField.conf\"\r\n```\r\n\r\n\r\n## Creating Workspace\r\n```sh\r\ndq_path_workspace(project_sda=project_sda)\r\n```\r\n\r\n\r\n## Download haas jar\r\n```sh\r\ndq_download_jar(haas_version=\"4.8.0\", force=True)\r\n```\r\n\r\n\r\n## Spark Session\r\n```sh\r\nspark, sc = dq_spark_session()\r\n```\r\n\r\n\r\n## Validate Conf\r\n```sh\r\ndq_validate_conf(url_conf=url_conf)\r\n```\r\n\r\n\r\n## Extract Params\r\n```sh\r\ndq_extract_parameters(url_conf=url_conf)\r\n```\r\n\r\n\r\n\r\n## Run \r\n```sh\r\ndq_run_sandbox(spark=spark,\r\n sc=sc,\r\n parameter_conf_list=parameter_conf_list,\r\n url_conf=url_conf)\r\n```\r\n\r\n\r\n\r\n\r\n\r\n## License\r\n\r\n[Apache License 2.0](https://www.dropbox.com/s/8t6xtgk06o3ij61/LICENSE?dl=0).\r\n\r\n## New features v1.0\r\n\r\n## BugFix\r\n\r\n- choco install visualcpp-build-tools\r\n\r\n## Reference\r\n\r\n- Jonathan Quiza [github](https://github.com/jonaqp).\r\n- Jonathan Quiza [RumiMLSpark](http://rumi-ml.herokuapp.com/).\r\n\r\n\r\n",
"bugtrack_url": null,
"license": "",
"summary": "spark_hdfs_tools",
"version": "0.1.1",
"project_urls": {
"Download": "https://github.com/jonaqp/spark_hdfs_tools/archive/main.zip",
"Homepage": "https://github.com/jonaqp/spark_hdfs_tools/"
},
"split_keywords": [
"spark",
"hdfs",
"dfs"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8efdeac587910119923aff36cae920a913ccd30d607ec7fe2564358f18ea9000",
"md5": "24f601c0a934b6088d8766300f7612ad",
"sha256": "ccc612e9c22391614ab48bc4115e90829dcda6ef354f50aff90377441ee05ef5"
},
"downloads": -1,
"filename": "spark_hdfs_tools-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "24f601c0a934b6088d8766300f7612ad",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 6927,
"upload_time": "2023-07-31T05:49:56",
"upload_time_iso_8601": "2023-07-31T05:49:56.986802Z",
"url": "https://files.pythonhosted.org/packages/8e/fd/eac587910119923aff36cae920a913ccd30d607ec7fe2564358f18ea9000/spark_hdfs_tools-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "75dff2b9042b4cef74b65bce333d7f9ee9da0094d0d4f71be1647aaad40cf848",
"md5": "565a73230fab3787c0301dc06f187b2d",
"sha256": "205146ccc1ec5a4e16a703ab73aaac179e182dbb8aaafa2cd10f280d7ef25786"
},
"downloads": -1,
"filename": "spark_hdfs_tools-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "565a73230fab3787c0301dc06f187b2d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 7377,
"upload_time": "2023-07-31T05:49:58",
"upload_time_iso_8601": "2023-07-31T05:49:58.867797Z",
"url": "https://files.pythonhosted.org/packages/75/df/f2b9042b4cef74b65bce333d7f9ee9da0094d0d4f71be1647aaad40cf848/spark_hdfs_tools-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-07-31 05:49:58",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jonaqp",
"github_project": "spark_hdfs_tools",
"github_not_found": true,
"lcname": "spark-hdfs-tools"
}