spark-datax-tools


Namespark-datax-tools JSON
Version 0.7.0 PyPI version JSON
download
home_pagehttps://github.com/jonaqp/spark_datax_tools/
Summaryspark_datax_tools
upload_time2024-09-19 00:39:50
maintainerNone
docs_urlNone
authorJonathan Quiza
requires_pythonNone
licenseNone
keywords spark datax schema
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # spark_datax_tools


[![Github License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Updates](https://pyup.io/repos/github/woctezuma/google-colab-transfer/shield.svg)](pyup)
[![Python 3](https://pyup.io/repos/github/woctezuma/google-colab-transfer/python-3-shield.svg)](pyup)
[![Code coverage](https://codecov.io/gh/woctezuma/google-colab-transfer/branch/master/graph/badge.svg)](codecov)




spark_datax_tools is a Python library that implements for dataX schemas
## Installation

The code is packaged for PyPI, so that the installation consists in running:
```sh
pip install spark-datax-tools 
```


## Usage

wrapper take DataX

```sh

Nomenclature Datax
================================
table_name = "t_pmfi_lcl_suppliers_purchases"
origen = "host"
destination = "hdfs"
datax_generated_nomenclature(table_name=table_name, 
                             origen=origen, 
                             destination=destination, 
                             output=True)




List of adaptaders
================================
datax_list_adapters()




Generated Ticket Adapter
============================================================
adapter_id = "ADAPTER_HDFS_OUTSTAGING"
parameter = {"uuaa":"na8z"}
datax_generated_ticket_adapter(adapter_id=adapter_id, 
                               parameter=parameter, 
                               is_dev=True
)
                               
                               
                               
Generated Ticket Transfer
============================================================
folder="CR-PEMFIMEN-T02"	
job_name="PMFITP4012"
crq="CRQ100000"
periodicity="mensual"
hour="10AM"
weight="50MB"
origen="host"
destination="hdfs"

datax_generated_ticket_transfer(
    folder=folder,	    
    job_name=job_name,    
    crq=crq,
    periodicity=periodicity,    
    hour=hour,    
    weight=weight	,    
    table_name=table_name,    
    origen=origen,
    destination=destination,
    is_dev=True
)
                               
     
                               
Generated Schema JSON Artifactory
============================================================
path_json = "lclsupplierspurchases.output.schema"
is_schema_origen_in = True
schema_type = "host"
convert_string = False

datax_generated_schema_artifactory( 
    path_json=path_json,
    is_schema_origen_in=schema_type,
    schema_type=schema_type,
    convert_string=convert_string
)
           
   
   
   
Generated Schema Json Datum
============================================================
spark = SparkSession.builder.master("local[*]").appName("SparkAPP").getOrCreate()
path="fields_pe_datum2.csv"
table_name="t_pmfi_lcl_suppliers_purchases"
origen="host"
destination="hdfs"
storage_zone="master"

datax_generated_schema_datum(
    spark=spark,
    path=path,
    table_name=table_name,
    origen=origen,
    destination=destination,
    storage_zone=storage_zone,
    convert_string=False
)
  
```



## License

[Apache License 2.0](https://www.dropbox.com/s/8t6xtgk06o3ij61/LICENSE?dl=0).


## New features v1.0

 
## BugFix
- choco install visualcpp-build-tools



## Reference

 - Jonathan Quiza [github](https://github.com/jonaqp).
 - Jonathan Quiza [RumiMLSpark](http://rumi-ml.herokuapp.com/).
 - Jonathan Quiza [linkedin](https://www.linkedin.com/in/jonaqp/).



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jonaqp/spark_datax_tools/",
    "name": "spark-datax-tools",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "spark, datax, schema",
    "author": "Jonathan Quiza",
    "author_email": "jony327@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/cc/f5/82de70567ca6c8db006176b9375ef37a4a78765f3031c751812d40d1d8c4/spark_datax_tools-0.7.0.tar.gz",
    "platform": null,
    "description": "# spark_datax_tools\r\n\r\n\r\n[![Github License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\r\n[![Updates](https://pyup.io/repos/github/woctezuma/google-colab-transfer/shield.svg)](pyup)\r\n[![Python 3](https://pyup.io/repos/github/woctezuma/google-colab-transfer/python-3-shield.svg)](pyup)\r\n[![Code coverage](https://codecov.io/gh/woctezuma/google-colab-transfer/branch/master/graph/badge.svg)](codecov)\r\n\r\n\r\n\r\n\r\nspark_datax_tools is a Python library that implements for dataX schemas\r\n## Installation\r\n\r\nThe code is packaged for PyPI, so that the installation consists in running:\r\n```sh\r\npip install spark-datax-tools \r\n```\r\n\r\n\r\n## Usage\r\n\r\nwrapper take DataX\r\n\r\n```sh\r\n\r\nNomenclature Datax\r\n================================\r\ntable_name = \"t_pmfi_lcl_suppliers_purchases\"\r\norigen = \"host\"\r\ndestination = \"hdfs\"\r\ndatax_generated_nomenclature(table_name=table_name, \r\n                             origen=origen, \r\n                             destination=destination, \r\n                             output=True)\r\n\r\n\r\n\r\n\r\nList of adaptaders\r\n================================\r\ndatax_list_adapters()\r\n\r\n\r\n\r\n\r\nGenerated Ticket Adapter\r\n============================================================\r\nadapter_id = \"ADAPTER_HDFS_OUTSTAGING\"\r\nparameter = {\"uuaa\":\"na8z\"}\r\ndatax_generated_ticket_adapter(adapter_id=adapter_id, \r\n                               parameter=parameter, \r\n                               is_dev=True\r\n)\r\n                               \r\n                               \r\n                               \r\nGenerated Ticket Transfer\r\n============================================================\r\nfolder=\"CR-PEMFIMEN-T02\"\t\r\njob_name=\"PMFITP4012\"\r\ncrq=\"CRQ100000\"\r\nperiodicity=\"mensual\"\r\nhour=\"10AM\"\r\nweight=\"50MB\"\r\norigen=\"host\"\r\ndestination=\"hdfs\"\r\n\r\ndatax_generated_ticket_transfer(\r\n    folder=folder,\t    \r\n    job_name=job_name,    \r\n    crq=crq,\r\n    periodicity=periodicity,    \r\n    hour=hour,    \r\n    weight=weight\t,    \r\n    table_name=table_name,    \r\n    origen=origen,\r\n    destination=destination,\r\n    is_dev=True\r\n)\r\n                               \r\n     \r\n                               \r\nGenerated Schema JSON Artifactory\r\n============================================================\r\npath_json = \"lclsupplierspurchases.output.schema\"\r\nis_schema_origen_in = True\r\nschema_type = \"host\"\r\nconvert_string = False\r\n\r\ndatax_generated_schema_artifactory( \r\n    path_json=path_json,\r\n    is_schema_origen_in=schema_type,\r\n    schema_type=schema_type,\r\n    convert_string=convert_string\r\n)\r\n           \r\n   \r\n   \r\n   \r\nGenerated Schema Json Datum\r\n============================================================\r\nspark = SparkSession.builder.master(\"local[*]\").appName(\"SparkAPP\").getOrCreate()\r\npath=\"fields_pe_datum2.csv\"\r\ntable_name=\"t_pmfi_lcl_suppliers_purchases\"\r\norigen=\"host\"\r\ndestination=\"hdfs\"\r\nstorage_zone=\"master\"\r\n\r\ndatax_generated_schema_datum(\r\n    spark=spark,\r\n    path=path,\r\n    table_name=table_name,\r\n    origen=origen,\r\n    destination=destination,\r\n    storage_zone=storage_zone,\r\n    convert_string=False\r\n)\r\n  \r\n```\r\n\r\n\r\n\r\n## License\r\n\r\n[Apache License 2.0](https://www.dropbox.com/s/8t6xtgk06o3ij61/LICENSE?dl=0).\r\n\r\n\r\n## New features v1.0\r\n\r\n \r\n## BugFix\r\n- choco install visualcpp-build-tools\r\n\r\n\r\n\r\n## Reference\r\n\r\n - Jonathan Quiza [github](https://github.com/jonaqp).\r\n - Jonathan Quiza [RumiMLSpark](http://rumi-ml.herokuapp.com/).\r\n - Jonathan Quiza [linkedin](https://www.linkedin.com/in/jonaqp/).\r\n\r\n\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "spark_datax_tools",
    "version": "0.7.0",
    "project_urls": {
        "Download": "https://github.com/jonaqp/spark_datax_tools/archive/main.zip",
        "Homepage": "https://github.com/jonaqp/spark_datax_tools/"
    },
    "split_keywords": [
        "spark",
        " datax",
        " schema"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5fa30884c84c36051491d140d7d401c01a710606564772b47bf72f5843e8cdc0",
                "md5": "a61a87b75c8edbe72fd8e074e4d65234",
                "sha256": "e070838e09be3ce342fb0d8751f3dd6f5f59b897dd4b483159f27b3da15206e0"
            },
            "downloads": -1,
            "filename": "spark_datax_tools-0.7.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a61a87b75c8edbe72fd8e074e4d65234",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 17320,
            "upload_time": "2024-09-19T00:39:49",
            "upload_time_iso_8601": "2024-09-19T00:39:49.659370Z",
            "url": "https://files.pythonhosted.org/packages/5f/a3/0884c84c36051491d140d7d401c01a710606564772b47bf72f5843e8cdc0/spark_datax_tools-0.7.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ccf582de70567ca6c8db006176b9375ef37a4a78765f3031c751812d40d1d8c4",
                "md5": "754b9c4e51a6e1f404cd31be17c6f8d4",
                "sha256": "2285d4d891970032baff82807c07adf19c5e94bbe68fa4f3e2515fe1b2d2e489"
            },
            "downloads": -1,
            "filename": "spark_datax_tools-0.7.0.tar.gz",
            "has_sig": false,
            "md5_digest": "754b9c4e51a6e1f404cd31be17c6f8d4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 14960,
            "upload_time": "2024-09-19T00:39:50",
            "upload_time_iso_8601": "2024-09-19T00:39:50.946667Z",
            "url": "https://files.pythonhosted.org/packages/cc/f5/82de70567ca6c8db006176b9375ef37a4a78765f3031c751812d40d1d8c4/spark_datax_tools-0.7.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-19 00:39:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jonaqp",
    "github_project": "spark_datax_tools",
    "github_not_found": true,
    "lcname": "spark-datax-tools"
}
        
Elapsed time: 0.41651s