# spark_datax_tools
[![Github License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Updates](https://pyup.io/repos/github/woctezuma/google-colab-transfer/shield.svg)](pyup)
[![Python 3](https://pyup.io/repos/github/woctezuma/google-colab-transfer/python-3-shield.svg)](pyup)
[![Code coverage](https://codecov.io/gh/woctezuma/google-colab-transfer/branch/master/graph/badge.svg)](codecov)
spark_datax_tools is a Python library that implements for dataX schemas
## Installation
The code is packaged for PyPI, so that the installation consists in running:
```sh
pip install spark-datax-tools
```
## Usage
wrapper take DataX
```sh
Nomenclature Datax
================================
table_name = "t_pmfi_lcl_suppliers_purchases"
origen = "host"
destination = "hdfs"
datax_generated_nomenclature(table_name=table_name,
origen=origen,
destination=destination,
output=True)
List of adaptaders
================================
datax_list_adapters()
Generated Ticket Adapter
============================================================
adapter_id = "ADAPTER_HDFS_OUTSTAGING"
parameter = {"uuaa":"na8z"}
datax_generated_ticket_adapter(adapter_id=adapter_id,
parameter=parameter,
is_dev=True
)
Generated Ticket Transfer
============================================================
folder="CR-PEMFIMEN-T02"
job_name="PMFITP4012"
crq="CRQ100000"
periodicity="mensual"
hour="10AM"
weight="50MB"
origen="host"
destination="hdfs"
datax_generated_ticket_transfer(
folder=folder,
job_name=job_name,
crq=crq,
periodicity=periodicity,
hour=hour,
weight=weight ,
table_name=table_name,
origen=origen,
destination=destination,
is_dev=True
)
Generated Schema JSON Artifactory
============================================================
path_json = "lclsupplierspurchases.output.schema"
is_schema_origen_in = True
schema_type = "host"
convert_string = False
datax_generated_schema_artifactory(
path_json=path_json,
is_schema_origen_in=schema_type,
schema_type=schema_type,
convert_string=convert_string
)
Generated Schema Json Datum
============================================================
spark = SparkSession.builder.master("local[*]").appName("SparkAPP").getOrCreate()
path="fields_pe_datum2.csv"
table_name="t_pmfi_lcl_suppliers_purchases"
origen="host"
destination="hdfs"
storage_zone="master"
datax_generated_schema_datum(
spark=spark,
path=path,
table_name=table_name,
origen=origen,
destination=destination,
storage_zone=storage_zone,
convert_string=False
)
```
## License
[Apache License 2.0](https://www.dropbox.com/s/8t6xtgk06o3ij61/LICENSE?dl=0).
## New features v1.0
## BugFix
- choco install visualcpp-build-tools
## Reference
- Jonathan Quiza [github](https://github.com/jonaqp).
- Jonathan Quiza [RumiMLSpark](http://rumi-ml.herokuapp.com/).
- Jonathan Quiza [linkedin](https://www.linkedin.com/in/jonaqp/).
Raw data
{
"_id": null,
"home_page": "https://github.com/jonaqp/spark_datax_tools/",
"name": "spark-datax-tools",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "spark,datax,schema",
"author": "Jonathan Quiza",
"author_email": "jony327@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/46/54/cf6694dd99a1b2db1ce295467bad80dec0380750927ac88a6c5076dc45a0/spark_datax_tools-0.6.6.tar.gz",
"platform": null,
"description": "# spark_datax_tools\r\n\r\n\r\n[![Github License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\r\n[![Updates](https://pyup.io/repos/github/woctezuma/google-colab-transfer/shield.svg)](pyup)\r\n[![Python 3](https://pyup.io/repos/github/woctezuma/google-colab-transfer/python-3-shield.svg)](pyup)\r\n[![Code coverage](https://codecov.io/gh/woctezuma/google-colab-transfer/branch/master/graph/badge.svg)](codecov)\r\n\r\n\r\n\r\n\r\nspark_datax_tools is a Python library that implements for dataX schemas\r\n## Installation\r\n\r\nThe code is packaged for PyPI, so that the installation consists in running:\r\n```sh\r\npip install spark-datax-tools \r\n```\r\n\r\n\r\n## Usage\r\n\r\nwrapper take DataX\r\n\r\n```sh\r\n\r\nNomenclature Datax\r\n================================\r\ntable_name = \"t_pmfi_lcl_suppliers_purchases\"\r\norigen = \"host\"\r\ndestination = \"hdfs\"\r\ndatax_generated_nomenclature(table_name=table_name, \r\n origen=origen, \r\n destination=destination, \r\n output=True)\r\n\r\n\r\n\r\n\r\nList of adaptaders\r\n================================\r\ndatax_list_adapters()\r\n\r\n\r\n\r\n\r\nGenerated Ticket Adapter\r\n============================================================\r\nadapter_id = \"ADAPTER_HDFS_OUTSTAGING\"\r\nparameter = {\"uuaa\":\"na8z\"}\r\ndatax_generated_ticket_adapter(adapter_id=adapter_id, \r\n parameter=parameter, \r\n is_dev=True\r\n)\r\n \r\n \r\n \r\nGenerated Ticket Transfer\r\n============================================================\r\nfolder=\"CR-PEMFIMEN-T02\"\t\r\njob_name=\"PMFITP4012\"\r\ncrq=\"CRQ100000\"\r\nperiodicity=\"mensual\"\r\nhour=\"10AM\"\r\nweight=\"50MB\"\r\norigen=\"host\"\r\ndestination=\"hdfs\"\r\n\r\ndatax_generated_ticket_transfer(\r\n folder=folder,\t \r\n job_name=job_name, \r\n crq=crq,\r\n periodicity=periodicity, \r\n hour=hour, \r\n weight=weight\t, \r\n table_name=table_name, \r\n origen=origen,\r\n destination=destination,\r\n is_dev=True\r\n)\r\n \r\n \r\n \r\nGenerated Schema JSON Artifactory\r\n============================================================\r\npath_json = \"lclsupplierspurchases.output.schema\"\r\nis_schema_origen_in = True\r\nschema_type = \"host\"\r\nconvert_string = False\r\n\r\ndatax_generated_schema_artifactory( \r\n path_json=path_json,\r\n is_schema_origen_in=schema_type,\r\n schema_type=schema_type,\r\n convert_string=convert_string\r\n)\r\n \r\n \r\n \r\n \r\nGenerated Schema Json Datum\r\n============================================================\r\nspark = SparkSession.builder.master(\"local[*]\").appName(\"SparkAPP\").getOrCreate()\r\npath=\"fields_pe_datum2.csv\"\r\ntable_name=\"t_pmfi_lcl_suppliers_purchases\"\r\norigen=\"host\"\r\ndestination=\"hdfs\"\r\nstorage_zone=\"master\"\r\n\r\ndatax_generated_schema_datum(\r\n spark=spark,\r\n path=path,\r\n table_name=table_name,\r\n origen=origen,\r\n destination=destination,\r\n storage_zone=storage_zone,\r\n convert_string=False\r\n)\r\n \r\n```\r\n\r\n\r\n\r\n## License\r\n\r\n[Apache License 2.0](https://www.dropbox.com/s/8t6xtgk06o3ij61/LICENSE?dl=0).\r\n\r\n\r\n## New features v1.0\r\n\r\n \r\n## BugFix\r\n- choco install visualcpp-build-tools\r\n\r\n\r\n\r\n## Reference\r\n\r\n - Jonathan Quiza [github](https://github.com/jonaqp).\r\n - Jonathan Quiza [RumiMLSpark](http://rumi-ml.herokuapp.com/).\r\n - Jonathan Quiza [linkedin](https://www.linkedin.com/in/jonaqp/).\r\n\r\n\r\n",
"bugtrack_url": null,
"license": "",
"summary": "spark_datax_tools",
"version": "0.6.6",
"project_urls": {
"Download": "https://github.com/jonaqp/spark_datax_tools/archive/main.zip",
"Homepage": "https://github.com/jonaqp/spark_datax_tools/"
},
"split_keywords": [
"spark",
"datax",
"schema"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "bd2c253d79fd9589855fc9ee7997f5b7e6eec16f39263af6eea35d2307600d59",
"md5": "3c8f3c224dc80f8b6541fff8b867e0af",
"sha256": "711d026f2e2385d762addbf338e4674e37be4cf8aae1b2bd15cb34e11bdf2643"
},
"downloads": -1,
"filename": "spark_datax_tools-0.6.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3c8f3c224dc80f8b6541fff8b867e0af",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 16453,
"upload_time": "2024-03-01T19:49:06",
"upload_time_iso_8601": "2024-03-01T19:49:06.039480Z",
"url": "https://files.pythonhosted.org/packages/bd/2c/253d79fd9589855fc9ee7997f5b7e6eec16f39263af6eea35d2307600d59/spark_datax_tools-0.6.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "4654cf6694dd99a1b2db1ce295467bad80dec0380750927ac88a6c5076dc45a0",
"md5": "4f7754bccd11ed025c10d8e9a069ac81",
"sha256": "58cb3d673ba009a42acefb1bf44040bd8f302c7df8f072077fbcd25e368ab840"
},
"downloads": -1,
"filename": "spark_datax_tools-0.6.6.tar.gz",
"has_sig": false,
"md5_digest": "4f7754bccd11ed025c10d8e9a069ac81",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 14519,
"upload_time": "2024-03-01T19:49:07",
"upload_time_iso_8601": "2024-03-01T19:49:07.319297Z",
"url": "https://files.pythonhosted.org/packages/46/54/cf6694dd99a1b2db1ce295467bad80dec0380750927ac88a6c5076dc45a0/spark_datax_tools-0.6.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-01 19:49:07",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jonaqp",
"github_project": "spark_datax_tools",
"github_not_found": true,
"lcname": "spark-datax-tools"
}