# spark_scaffolder_transforms_tools
[![Github License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Updates](https://pyup.io/repos/github/woctezuma/google-colab-transfer/shield.svg)](pyup)
[![Python 3](https://pyup.io/repos/github/woctezuma/google-colab-transfer/python-3-shield.svg)](pyup)
[![Code coverage](https://codecov.io/gh/woctezuma/google-colab-transfer/branch/master/graph/badge.svg)](codecov)
spark_scaffolder_transforms_tools is a Python library that implements transforms Kirby3x
## Installation
> The code is packaged for PyPI, so that the installation consists in running:
```sh
pip install spark-scaffolder-transforms-tools --upgrade
```
## Usage
## CacheTransformation (type = "cache")
example:
```hocon
{
type = "cache"
inputs = ["input1", "input2"]
}
```
## UnpersistTransformation (type = "unpersist")
example:
```hocon
{
type = "unpersist"
inputs = ["input1", "input2"]
}
```
## InitalizeNullsTransformation (type = "initialize-nulls")
Permite inicializar valores *null* en las columnas indicadas con un valor por defecto.
Su configuración se ajusta a los siguientes parámetros:
| Parámetro | Descripción | Tipo | Obligatoria |
|:---------------:|:-----------------------------------------------------------|:---------------:|:-----------------------------:|
| **field** | nombre de columna donde se buscan valores *null* | string | SI, excluyente con **fields** |
| **fields** | lista de nombres de columna donde se buscan valores *null* | lista de string | SI, excluyente con **field** |
| **default** | configuración de la lógica de datos | string | SI |
Ejemplos de uso:
```hocon
{
type = "initialize-nulls"
field = "col1"
default = "value1"
}
```
```hocon
{
type = "initialize-nulls"
fields = ["col1","col2","col3"]
default = "value"
}
```
## JoinTransformation (type = "join")
Realiza operaciones tipo *join* sobre multiples dataframes:
Su configuración se ajusta a los siguientes parámetros:
| Parámetro | Descripción | Tipo | Obligatoria |
|:---------------:|:----------------------------------------------------------------------------------------------------|:---------------:|:-----------:|
| **joinType** | indica el tipo de join | string | SI |
| **inputs** | lista de nombres de **input**, al menos deben tener dos elementos | lista de string | SI |
| **joinColumns** | lista de configs, donde se relacion cada **input** con la lista de columnas para realizar el *join* | lista de config | SI |
| **output** | nombre del **output** asociado al resultado del *join* | string | SI |
Ejemplo de uso, en este caso realiza un join de tipo *leftanti* sobre dos dataframes asociados a *t_users* y
*t_people* por medio de las columnas *id2* e *id* respectivamente, el resultado del join se asigna al output *t_users_people*:
```hocon
{
type = "join"
inputs = ["t_users", "t_people"]
joinType = "leftanti"
joinColumns = [
{
"t_users" = ["id2"]
},
{
"t_people" = ["id"]
}
]
output = "t_users_people"
}
```
**Atención**: Si se desea utilizar la transformación type = "join" nativa Kirby, esta debe registrar en los ficheros *Shifu*
mediante **type = "kirby-join"**
## PipelineTransformation (type = "pipeline")
Aplica un *pipeline* de transformaciones a uno o varios **inputs**, permitiendo asi ficheros de configuración más simples
Su configuración se ajusta a los siguientes parámetros:
| Parámetro | Descripción | Tipo | Obligatoria |
|:------------:|:---------------------------------------------|:---------------:|:-----------:|
| **pipeline** | lista de configuraciones de transformaciones | lista de config | SI |
Ejemplos de uso:
```hocon
{
type = "pipe"
inputs = ["t_users","t_people"]
pipeline = [
{
type = "literal"
field = "status"
default = "registered"
defaultType = "string"
},
{
type = "filter"
filters = [
{
field = "gf_odate_date"
op = "eq"
value = "20210110"
}
]
}
]
}
```
## UnionTransformation (type = "union")
Aplica una operación *union* entre multiples **inputs**, es condición que los **inputs** deban tener el mismo esquema
o la operación fallará
Su configuración se ajusta a los siguientes parámetros:
| Parámetro | Descripción | Tipo | Obligatoria |
|:----------:|:----------------------------------------------------------|:---------------:|:-----------:|
| **inputs** | lista de **inputs** que se van a unir | lista de string | SI |
| **output** | nombre del **output** asociado al resultado de la *union* | string | SI |
Ejemplos de uso:
```hocon
{
type = "union"
inputs = ["t_users_es","t_users_mx"]
output = "t_users"
}
```
## License
[Apache License 2.0](https://www.dropbox.com/s/8t6xtgk06o3ij61/LICENSE?dl=0).
## New features v1.0
## BugFix
- choco install visualcpp-build-tools
## Reference
- Jonathan Quiza [github](https://github.com/jonaqp).
- Jonathan Quiza [RumiMLSpark](http://rumi-ml.herokuapp.com/).
- Jonathan Quiza [linkedin](https://www.linkedin.com/in/jonaqp/).
Raw data
{
"_id": null,
"home_page": "https://github.com/jonaqp/spark_scaffolder_transforms_tools",
"name": "spark-scaffolder-transforms-tools",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "spark, scaffolder, pyspark",
"author": "Jonathan Quiza",
"author_email": "jony327@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/0f/ab/65fe8baad4305941ec2a5d2ce0941e67de93a1abbc5a8fc1616f00b3bf5f/spark_scaffolder_transforms_tools-0.0.1.tar.gz",
"platform": null,
"description": "# spark_scaffolder_transforms_tools\r\n\r\n\r\n[![Github License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\r\n[![Updates](https://pyup.io/repos/github/woctezuma/google-colab-transfer/shield.svg)](pyup)\r\n[![Python 3](https://pyup.io/repos/github/woctezuma/google-colab-transfer/python-3-shield.svg)](pyup)\r\n[![Code coverage](https://codecov.io/gh/woctezuma/google-colab-transfer/branch/master/graph/badge.svg)](codecov)\r\n\r\n\r\n\r\n\r\nspark_scaffolder_transforms_tools is a Python library that implements transforms Kirby3x\r\n## Installation\r\n\r\n> The code is packaged for PyPI, so that the installation consists in running:\r\n\r\n```sh\r\npip install spark-scaffolder-transforms-tools --upgrade\r\n```\r\n\r\n\r\n## Usage\r\n\r\n\r\n## CacheTransformation (type = \"cache\")\r\n\r\nexample:\r\n```hocon\r\n {\r\n type = \"cache\"\r\n inputs = [\"input1\", \"input2\"]\r\n }\r\n```\r\n\r\n## UnpersistTransformation (type = \"unpersist\")\r\nexample:\r\n\r\n```hocon\r\n {\r\n type = \"unpersist\"\r\n inputs = [\"input1\", \"input2\"]\r\n }\r\n```\r\n\r\n## InitalizeNullsTransformation (type = \"initialize-nulls\")\r\nPermite inicializar valores *null* en las columnas indicadas con un valor por defecto.\r\n\r\nSu configuraci\u00c3\u00b3n se ajusta a los siguientes par\u00c3\u00a1metros:\r\n\r\n\r\n| Par\u00c3\u00a1metro | Descripci\u00c3\u00b3n | Tipo | Obligatoria |\r\n|:---------------:|:-----------------------------------------------------------|:---------------:|:-----------------------------:|\r\n| **field** | nombre de columna donde se buscan valores *null* | string | SI, excluyente con **fields** |\r\n| **fields** | lista de nombres de columna donde se buscan valores *null* | lista de string | SI, excluyente con **field** |\r\n| **default** | configuraci\u00c3\u00b3n de la l\u00c3\u00b3gica de datos | string | SI |\r\n\r\nEjemplos de uso:\r\n\r\n```hocon\r\n {\r\n type = \"initialize-nulls\"\r\n field = \"col1\"\r\n default = \"value1\"\r\n }\r\n```\r\n\r\n```hocon\r\n {\r\n type = \"initialize-nulls\"\r\n fields = [\"col1\",\"col2\",\"col3\"]\r\n default = \"value\"\r\n }\r\n```\r\n\r\n## JoinTransformation (type = \"join\")\r\nRealiza operaciones tipo *join* sobre multiples dataframes:\r\n\r\nSu configuraci\u00c3\u00b3n se ajusta a los siguientes par\u00c3\u00a1metros:\r\n\r\n| Par\u00c3\u00a1metro | Descripci\u00c3\u00b3n | Tipo | Obligatoria |\r\n|:---------------:|:----------------------------------------------------------------------------------------------------|:---------------:|:-----------:|\r\n| **joinType** | indica el tipo de join | string | SI |\r\n| **inputs** | lista de nombres de **input**, al menos deben tener dos elementos | lista de string | SI |\r\n| **joinColumns** | lista de configs, donde se relacion cada **input** con la lista de columnas para realizar el *join* | lista de config | SI |\r\n| **output** | nombre del **output** asociado al resultado del *join* | string | SI |\r\n\r\nEjemplo de uso, en este caso realiza un join de tipo *leftanti* sobre dos dataframes asociados a *t_users* y\r\n*t_people* por medio de las columnas *id2* e *id* respectivamente, el resultado del join se asigna al output *t_users_people*:\r\n\r\n```hocon\r\n {\r\n type = \"join\"\r\n inputs = [\"t_users\", \"t_people\"]\r\n joinType = \"leftanti\"\r\n joinColumns = [\r\n { \r\n \"t_users\" = [\"id2\"] \r\n },\r\n { \r\n \"t_people\" = [\"id\"] \r\n }\r\n ]\r\n output = \"t_users_people\"\r\n }\r\n```\r\n\r\n**Atenci\u00c3\u00b3n**: Si se desea utilizar la transformaci\u00c3\u00b3n type = \"join\" nativa Kirby, esta debe registrar en los ficheros *Shifu*\r\nmediante **type = \"kirby-join\"**\r\n\r\n## PipelineTransformation (type = \"pipeline\")\r\nAplica un *pipeline* de transformaciones a uno o varios **inputs**, permitiendo asi ficheros de configuraci\u00c3\u00b3n m\u00c3\u00a1s simples\r\n\r\nSu configuraci\u00c3\u00b3n se ajusta a los siguientes par\u00c3\u00a1metros:\r\n\r\n\r\n| Par\u00c3\u00a1metro | Descripci\u00c3\u00b3n | Tipo | Obligatoria |\r\n|:------------:|:---------------------------------------------|:---------------:|:-----------:|\r\n| **pipeline** | lista de configuraciones de transformaciones | lista de config | SI |\r\n\r\nEjemplos de uso:\r\n\r\n```hocon\r\n {\r\n type = \"pipe\"\r\n inputs = [\"t_users\",\"t_people\"]\r\n pipeline = [\r\n {\r\n type = \"literal\"\r\n field = \"status\"\r\n default = \"registered\"\r\n defaultType = \"string\"\r\n },\r\n {\r\n type = \"filter\"\r\n filters = [\r\n {\r\n field = \"gf_odate_date\"\r\n op = \"eq\"\r\n value = \"20210110\"\r\n }\r\n ]\r\n }\r\n ]\r\n }\r\n```\r\n\r\n## UnionTransformation (type = \"union\")\r\nAplica una operaci\u00c3\u00b3n *union* entre multiples **inputs**, es condici\u00c3\u00b3n que los **inputs** deban tener el mismo esquema\r\no la operaci\u00c3\u00b3n fallar\u00c3\u00a1\r\n\r\nSu configuraci\u00c3\u00b3n se ajusta a los siguientes par\u00c3\u00a1metros:\r\n\r\n\r\n| Par\u00c3\u00a1metro | Descripci\u00c3\u00b3n | Tipo | Obligatoria |\r\n|:----------:|:----------------------------------------------------------|:---------------:|:-----------:|\r\n| **inputs** | lista de **inputs** que se van a unir | lista de string | SI |\r\n| **output** | nombre del **output** asociado al resultado de la *union* | string | SI |\r\n\r\nEjemplos de uso:\r\n\r\n```hocon\r\n {\r\n type = \"union\"\r\n inputs = [\"t_users_es\",\"t_users_mx\"]\r\n output = \"t_users\"\r\n }\r\n```\r\n\r\n\r\n## License\r\n\r\n[Apache License 2.0](https://www.dropbox.com/s/8t6xtgk06o3ij61/LICENSE?dl=0).\r\n\r\n\r\n## New features v1.0\r\n\r\n \r\n## BugFix\r\n- choco install visualcpp-build-tools\r\n\r\n\r\n\r\n## Reference\r\n\r\n - Jonathan Quiza [github](https://github.com/jonaqp).\r\n - Jonathan Quiza [RumiMLSpark](http://rumi-ml.herokuapp.com/).\r\n - Jonathan Quiza [linkedin](https://www.linkedin.com/in/jonaqp/).\r\n\r\n\r\n",
"bugtrack_url": null,
"license": null,
"summary": "spark_scaffolder_transforms_tools",
"version": "0.0.1",
"project_urls": {
"Download": "https://github.com/jonaqp/spark_scaffolder_transforms_tools/archive/main.zip",
"Homepage": "https://github.com/jonaqp/spark_scaffolder_transforms_tools"
},
"split_keywords": [
"spark",
" scaffolder",
" pyspark"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "246fcf7a265118cf9e1a627d9061ecf1d578d3c579ec4478231e8ebc3d3f0249",
"md5": "b8e1f8bbe232c46d6aa15e6f1bf1ed17",
"sha256": "ac9772d1f005e50d13e0911178113017b44e5cc63233369cd82a475f43b20c71"
},
"downloads": -1,
"filename": "spark_scaffolder_transforms_tools-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b8e1f8bbe232c46d6aa15e6f1bf1ed17",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 28720,
"upload_time": "2024-04-04T08:43:32",
"upload_time_iso_8601": "2024-04-04T08:43:32.360255Z",
"url": "https://files.pythonhosted.org/packages/24/6f/cf7a265118cf9e1a627d9061ecf1d578d3c579ec4478231e8ebc3d3f0249/spark_scaffolder_transforms_tools-0.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "0fab65fe8baad4305941ec2a5d2ce0941e67de93a1abbc5a8fc1616f00b3bf5f",
"md5": "68eeffab9acdd102b4e5a454019d4d63",
"sha256": "3438be8b9554aecaf79a186d17b9d5fb8953d7cd4372e3f40d49922927c55ffb"
},
"downloads": -1,
"filename": "spark_scaffolder_transforms_tools-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "68eeffab9acdd102b4e5a454019d4d63",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 17448,
"upload_time": "2024-04-04T08:43:33",
"upload_time_iso_8601": "2024-04-04T08:43:33.793944Z",
"url": "https://files.pythonhosted.org/packages/0f/ab/65fe8baad4305941ec2a5d2ce0941e67de93a1abbc5a8fc1616f00b3bf5f/spark_scaffolder_transforms_tools-0.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-04 08:43:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jonaqp",
"github_project": "spark_scaffolder_transforms_tools",
"github_not_found": true,
"lcname": "spark-scaffolder-transforms-tools"
}