spark-scaffolder-transforms-tools


Namespark-scaffolder-transforms-tools JSON
Version 0.0.1 PyPI version JSON
download
home_pagehttps://github.com/jonaqp/spark_scaffolder_transforms_tools
Summaryspark_scaffolder_transforms_tools
upload_time2024-04-04 08:43:33
maintainerNone
docs_urlNone
authorJonathan Quiza
requires_pythonNone
licenseNone
keywords spark scaffolder pyspark
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # spark_scaffolder_transforms_tools


[![Github License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Updates](https://pyup.io/repos/github/woctezuma/google-colab-transfer/shield.svg)](pyup)
[![Python 3](https://pyup.io/repos/github/woctezuma/google-colab-transfer/python-3-shield.svg)](pyup)
[![Code coverage](https://codecov.io/gh/woctezuma/google-colab-transfer/branch/master/graph/badge.svg)](codecov)




spark_scaffolder_transforms_tools is a Python library that implements transforms Kirby3x
## Installation

> The code is packaged for PyPI, so that the installation consists in running:

```sh
pip install spark-scaffolder-transforms-tools --upgrade
```


## Usage


## CacheTransformation (type = "cache")

example:
```hocon
    {
        type = "cache"
        inputs = ["input1", "input2"]
    }
```

## UnpersistTransformation (type = "unpersist")
example:

```hocon
    {
        type = "unpersist"
        inputs = ["input1", "input2"]
    }
```

## InitalizeNullsTransformation (type = "initialize-nulls")
Permite inicializar valores *null* en las columnas indicadas con un valor por defecto.

Su configuración se ajusta a los siguientes parámetros:


|    Parámetro    | Descripción                                                |      Tipo       |          Obligatoria          |
|:---------------:|:-----------------------------------------------------------|:---------------:|:-----------------------------:|
|    **field**    | nombre de columna donde se buscan valores *null*           |     string      | SI, excluyente con **fields** |
|   **fields**    | lista de nombres de columna donde se buscan valores *null* | lista de string | SI, excluyente con **field**  |
|   **default**   | configuración de la lógica de datos                        |     string      |              SI               |

Ejemplos de uso:

```hocon
    {
        type = "initialize-nulls"
        field = "col1"
        default = "value1"
    }
```

```hocon
    {
        type = "initialize-nulls"
        fields = ["col1","col2","col3"]
        default = "value"
    }
```

## JoinTransformation (type = "join")
Realiza operaciones tipo *join* sobre multiples dataframes:

Su configuración se ajusta a los siguientes parámetros:

|    Parámetro    | Descripción                                                                                         |      Tipo       | Obligatoria |
|:---------------:|:----------------------------------------------------------------------------------------------------|:---------------:|:-----------:|
|  **joinType**   | indica el tipo de join                                                                              |     string      |     SI      |
|   **inputs**    | lista de nombres de **input**, al menos deben tener dos elementos                                   | lista de string |     SI      |
| **joinColumns** | lista de configs, donde se relacion cada **input** con la lista de columnas para realizar el *join* | lista de config |     SI      |
|   **output**    | nombre del **output** asociado al resultado del *join*                                              |     string      |     SI      |

Ejemplo de uso, en este caso realiza un join de tipo *leftanti* sobre dos dataframes asociados a *t_users* y
*t_people* por medio de las columnas *id2* e *id* respectivamente, el resultado del join se asigna al output *t_users_people*:

```hocon
    {
        type = "join"
        inputs = ["t_users", "t_people"]
        joinType = "leftanti"
        joinColumns = [
            { 
                "t_users" = ["id2"] 
            },
            { 
                "t_people" = ["id"] 
            }
        ]
        output = "t_users_people"
    }
```

**Atención**: Si se desea utilizar la transformación type = "join" nativa Kirby, esta debe registrar en los ficheros *Shifu*
mediante **type = "kirby-join"**

## PipelineTransformation (type = "pipeline")
Aplica un *pipeline* de transformaciones a uno o varios **inputs**, permitiendo asi ficheros de configuración más simples

Su configuración se ajusta a los siguientes parámetros:


|  Parámetro   | Descripción                                  |      Tipo       | Obligatoria |
|:------------:|:---------------------------------------------|:---------------:|:-----------:|
| **pipeline** | lista de configuraciones de transformaciones | lista de config |     SI      |

Ejemplos de uso:

```hocon
    {
        type = "pipe"
        inputs = ["t_users","t_people"]
        pipeline = [
            {
                type = "literal"
                field = "status"
                default = "registered"
                defaultType = "string"
            },
            {
                type = "filter"
                filters = [
                    {
                        field = "gf_odate_date"
                        op = "eq"
                        value = "20210110"
                    }
                ]
            }
        ]
    }
```

## UnionTransformation (type = "union")
Aplica una operación *union* entre multiples **inputs**, es condición que los **inputs** deban tener el mismo esquema
o la operación fallará

Su configuración se ajusta a los siguientes parámetros:


| Parámetro  | Descripción                                               |      Tipo       | Obligatoria |
|:----------:|:----------------------------------------------------------|:---------------:|:-----------:|
| **inputs** | lista de **inputs** que se van a unir                     | lista de string |     SI      |
| **output** | nombre del **output** asociado al resultado de la *union* |     string      |     SI      |

Ejemplos de uso:

```hocon
    {
        type = "union"
        inputs = ["t_users_es","t_users_mx"]
        output = "t_users"
    }
```


## License

[Apache License 2.0](https://www.dropbox.com/s/8t6xtgk06o3ij61/LICENSE?dl=0).


## New features v1.0

 
## BugFix
- choco install visualcpp-build-tools



## Reference

 - Jonathan Quiza [github](https://github.com/jonaqp).
 - Jonathan Quiza [RumiMLSpark](http://rumi-ml.herokuapp.com/).
 - Jonathan Quiza [linkedin](https://www.linkedin.com/in/jonaqp/).



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jonaqp/spark_scaffolder_transforms_tools",
    "name": "spark-scaffolder-transforms-tools",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "spark, scaffolder, pyspark",
    "author": "Jonathan Quiza",
    "author_email": "jony327@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/0f/ab/65fe8baad4305941ec2a5d2ce0941e67de93a1abbc5a8fc1616f00b3bf5f/spark_scaffolder_transforms_tools-0.0.1.tar.gz",
    "platform": null,
    "description": "# spark_scaffolder_transforms_tools\r\n\r\n\r\n[![Github License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\r\n[![Updates](https://pyup.io/repos/github/woctezuma/google-colab-transfer/shield.svg)](pyup)\r\n[![Python 3](https://pyup.io/repos/github/woctezuma/google-colab-transfer/python-3-shield.svg)](pyup)\r\n[![Code coverage](https://codecov.io/gh/woctezuma/google-colab-transfer/branch/master/graph/badge.svg)](codecov)\r\n\r\n\r\n\r\n\r\nspark_scaffolder_transforms_tools is a Python library that implements transforms Kirby3x\r\n## Installation\r\n\r\n> The code is packaged for PyPI, so that the installation consists in running:\r\n\r\n```sh\r\npip install spark-scaffolder-transforms-tools --upgrade\r\n```\r\n\r\n\r\n## Usage\r\n\r\n\r\n## CacheTransformation (type = \"cache\")\r\n\r\nexample:\r\n```hocon\r\n    {\r\n        type = \"cache\"\r\n        inputs = [\"input1\", \"input2\"]\r\n    }\r\n```\r\n\r\n## UnpersistTransformation (type = \"unpersist\")\r\nexample:\r\n\r\n```hocon\r\n    {\r\n        type = \"unpersist\"\r\n        inputs = [\"input1\", \"input2\"]\r\n    }\r\n```\r\n\r\n## InitalizeNullsTransformation (type = \"initialize-nulls\")\r\nPermite inicializar valores *null* en las columnas indicadas con un valor por defecto.\r\n\r\nSu configuraci\u00c3\u00b3n se ajusta a los siguientes par\u00c3\u00a1metros:\r\n\r\n\r\n|    Par\u00c3\u00a1metro    | Descripci\u00c3\u00b3n                                                |      Tipo       |          Obligatoria          |\r\n|:---------------:|:-----------------------------------------------------------|:---------------:|:-----------------------------:|\r\n|    **field**    | nombre de columna donde se buscan valores *null*           |     string      | SI, excluyente con **fields** |\r\n|   **fields**    | lista de nombres de columna donde se buscan valores *null* | lista de string | SI, excluyente con **field**  |\r\n|   **default**   | configuraci\u00c3\u00b3n de la l\u00c3\u00b3gica de datos                        |     string      |              SI               |\r\n\r\nEjemplos de uso:\r\n\r\n```hocon\r\n    {\r\n        type = \"initialize-nulls\"\r\n        field = \"col1\"\r\n        default = \"value1\"\r\n    }\r\n```\r\n\r\n```hocon\r\n    {\r\n        type = \"initialize-nulls\"\r\n        fields = [\"col1\",\"col2\",\"col3\"]\r\n        default = \"value\"\r\n    }\r\n```\r\n\r\n## JoinTransformation (type = \"join\")\r\nRealiza operaciones tipo *join* sobre multiples dataframes:\r\n\r\nSu configuraci\u00c3\u00b3n se ajusta a los siguientes par\u00c3\u00a1metros:\r\n\r\n|    Par\u00c3\u00a1metro    | Descripci\u00c3\u00b3n                                                                                         |      Tipo       | Obligatoria |\r\n|:---------------:|:----------------------------------------------------------------------------------------------------|:---------------:|:-----------:|\r\n|  **joinType**   | indica el tipo de join                                                                              |     string      |     SI      |\r\n|   **inputs**    | lista de nombres de **input**, al menos deben tener dos elementos                                   | lista de string |     SI      |\r\n| **joinColumns** | lista de configs, donde se relacion cada **input** con la lista de columnas para realizar el *join* | lista de config |     SI      |\r\n|   **output**    | nombre del **output** asociado al resultado del *join*                                              |     string      |     SI      |\r\n\r\nEjemplo de uso, en este caso realiza un join de tipo *leftanti* sobre dos dataframes asociados a *t_users* y\r\n*t_people* por medio de las columnas *id2* e *id* respectivamente, el resultado del join se asigna al output *t_users_people*:\r\n\r\n```hocon\r\n    {\r\n        type = \"join\"\r\n        inputs = [\"t_users\", \"t_people\"]\r\n        joinType = \"leftanti\"\r\n        joinColumns = [\r\n            { \r\n                \"t_users\" = [\"id2\"] \r\n            },\r\n            { \r\n                \"t_people\" = [\"id\"] \r\n            }\r\n        ]\r\n        output = \"t_users_people\"\r\n    }\r\n```\r\n\r\n**Atenci\u00c3\u00b3n**: Si se desea utilizar la transformaci\u00c3\u00b3n type = \"join\" nativa Kirby, esta debe registrar en los ficheros *Shifu*\r\nmediante **type = \"kirby-join\"**\r\n\r\n## PipelineTransformation (type = \"pipeline\")\r\nAplica un *pipeline* de transformaciones a uno o varios **inputs**, permitiendo asi ficheros de configuraci\u00c3\u00b3n m\u00c3\u00a1s simples\r\n\r\nSu configuraci\u00c3\u00b3n se ajusta a los siguientes par\u00c3\u00a1metros:\r\n\r\n\r\n|  Par\u00c3\u00a1metro   | Descripci\u00c3\u00b3n                                  |      Tipo       | Obligatoria |\r\n|:------------:|:---------------------------------------------|:---------------:|:-----------:|\r\n| **pipeline** | lista de configuraciones de transformaciones | lista de config |     SI      |\r\n\r\nEjemplos de uso:\r\n\r\n```hocon\r\n    {\r\n        type = \"pipe\"\r\n        inputs = [\"t_users\",\"t_people\"]\r\n        pipeline = [\r\n            {\r\n                type = \"literal\"\r\n                field = \"status\"\r\n                default = \"registered\"\r\n                defaultType = \"string\"\r\n            },\r\n            {\r\n                type = \"filter\"\r\n                filters = [\r\n                    {\r\n                        field = \"gf_odate_date\"\r\n                        op = \"eq\"\r\n                        value = \"20210110\"\r\n                    }\r\n                ]\r\n            }\r\n        ]\r\n    }\r\n```\r\n\r\n## UnionTransformation (type = \"union\")\r\nAplica una operaci\u00c3\u00b3n *union* entre multiples **inputs**, es condici\u00c3\u00b3n que los **inputs** deban tener el mismo esquema\r\no la operaci\u00c3\u00b3n fallar\u00c3\u00a1\r\n\r\nSu configuraci\u00c3\u00b3n se ajusta a los siguientes par\u00c3\u00a1metros:\r\n\r\n\r\n| Par\u00c3\u00a1metro  | Descripci\u00c3\u00b3n                                               |      Tipo       | Obligatoria |\r\n|:----------:|:----------------------------------------------------------|:---------------:|:-----------:|\r\n| **inputs** | lista de **inputs** que se van a unir                     | lista de string |     SI      |\r\n| **output** | nombre del **output** asociado al resultado de la *union* |     string      |     SI      |\r\n\r\nEjemplos de uso:\r\n\r\n```hocon\r\n    {\r\n        type = \"union\"\r\n        inputs = [\"t_users_es\",\"t_users_mx\"]\r\n        output = \"t_users\"\r\n    }\r\n```\r\n\r\n\r\n## License\r\n\r\n[Apache License 2.0](https://www.dropbox.com/s/8t6xtgk06o3ij61/LICENSE?dl=0).\r\n\r\n\r\n## New features v1.0\r\n\r\n \r\n## BugFix\r\n- choco install visualcpp-build-tools\r\n\r\n\r\n\r\n## Reference\r\n\r\n - Jonathan Quiza [github](https://github.com/jonaqp).\r\n - Jonathan Quiza [RumiMLSpark](http://rumi-ml.herokuapp.com/).\r\n - Jonathan Quiza [linkedin](https://www.linkedin.com/in/jonaqp/).\r\n\r\n\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "spark_scaffolder_transforms_tools",
    "version": "0.0.1",
    "project_urls": {
        "Download": "https://github.com/jonaqp/spark_scaffolder_transforms_tools/archive/main.zip",
        "Homepage": "https://github.com/jonaqp/spark_scaffolder_transforms_tools"
    },
    "split_keywords": [
        "spark",
        " scaffolder",
        " pyspark"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "246fcf7a265118cf9e1a627d9061ecf1d578d3c579ec4478231e8ebc3d3f0249",
                "md5": "b8e1f8bbe232c46d6aa15e6f1bf1ed17",
                "sha256": "ac9772d1f005e50d13e0911178113017b44e5cc63233369cd82a475f43b20c71"
            },
            "downloads": -1,
            "filename": "spark_scaffolder_transforms_tools-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b8e1f8bbe232c46d6aa15e6f1bf1ed17",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 28720,
            "upload_time": "2024-04-04T08:43:32",
            "upload_time_iso_8601": "2024-04-04T08:43:32.360255Z",
            "url": "https://files.pythonhosted.org/packages/24/6f/cf7a265118cf9e1a627d9061ecf1d578d3c579ec4478231e8ebc3d3f0249/spark_scaffolder_transforms_tools-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0fab65fe8baad4305941ec2a5d2ce0941e67de93a1abbc5a8fc1616f00b3bf5f",
                "md5": "68eeffab9acdd102b4e5a454019d4d63",
                "sha256": "3438be8b9554aecaf79a186d17b9d5fb8953d7cd4372e3f40d49922927c55ffb"
            },
            "downloads": -1,
            "filename": "spark_scaffolder_transforms_tools-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "68eeffab9acdd102b4e5a454019d4d63",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 17448,
            "upload_time": "2024-04-04T08:43:33",
            "upload_time_iso_8601": "2024-04-04T08:43:33.793944Z",
            "url": "https://files.pythonhosted.org/packages/0f/ab/65fe8baad4305941ec2a5d2ce0941e67de93a1abbc5a8fc1616f00b3bf5f/spark_scaffolder_transforms_tools-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-04 08:43:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jonaqp",
    "github_project": "spark_scaffolder_transforms_tools",
    "github_not_found": true,
    "lcname": "spark-scaffolder-transforms-tools"
}
        
Elapsed time: 0.22779s