sparkautomapper


Namesparkautomapper JSON
Version 0.1.77 PyPI version JSON
download
home_pagehttps://github.com/imranq2/SparkAutoMapper
SummaryAutoMapper for Spark
upload_time2020-12-01 04:26:03
maintainer
docs_urlNone
authorImran Qureshi
requires_python>=3
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SparkAutoMapper
Fluent API to map data from one view to another in Spark.  

Uses native Spark functions underneath so it is just as fast as hand writing the transformations.

Since this is just Python, you can use any Python editor.  Since everything is typed using Python typings, most editors will auto-complete and warn you when you do something wrong

## Usage
```shell script
pip install sparkautomapper
```

## SparkAutoMapper input and output
You can pass either a dataframe to SparkAutoMapper or specify the name of a Spark view to read from.

You can receive the result as a dataframe or (optionally) pass in the name of a view where you want the result.

## Dynamic Typing Examples
#### Set a column in destination to a text value (read from pass in data frame and return the result in a new dataframe)
Set a column in destination to a text value
```python
from spark_auto_mapper.automappers.automapper import AutoMapper

mapper = AutoMapper(
    keys=["member_id"]
).columns(
    dst1="hello"
)
```

#### Set a column in destination to a text value (read from a Spark view and put result in another Spark view)
Set a column in destination to a text value
```python
from spark_auto_mapper.automappers.automapper import AutoMapper

mapper = AutoMapper(
    view="members",
    source_view="patients",
    keys=["member_id"]
).columns(
    dst1="hello"
)
```

#### Set a column in destination to an int value
Set a column in destination to a text value
```python
from spark_auto_mapper.automappers.automapper import AutoMapper

mapper = AutoMapper(
    view="members",
    source_view="patients",
    keys=["member_id"]
).columns(
    dst1=1050
)
```

#### Copy a column (src1) from source_view to destination view column (dst1)
```python
from spark_auto_mapper.automappers.automapper import AutoMapper
from spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A

mapper = AutoMapper(
    view="members",
    source_view="patients",
    keys=["member_id"]
).columns(
    dst1=A.column("src1")
)
```
Or you can use the shortcut for specifying a column (wrap column name in [])
```python
from spark_auto_mapper.automappers.automapper import AutoMapper

mapper = AutoMapper(
    view="members",
    source_view="patients",
    keys=["member_id"]
).columns(
    dst1="[src1]"
)
```

#### Convert data type for a column (or string literal)
```python
from spark_auto_mapper.automappers.automapper import AutoMapper
from spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A

mapper = AutoMapper(
    view="members",
    source_view="patients",
    keys=["member_id"]
).columns(
    birthDate=A.date(A.column("date_of_birth"))
)
```

#### Use a Spark SQL Expression (Any valid Spark SQL expression can be used)
```python
from spark_auto_mapper.automappers.automapper import AutoMapper
from spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A

mapper = AutoMapper(
    view="members",
    source_view="patients",
    keys=["member_id"]
).columns(
    gender=A.expression(
    """
    CASE
        WHEN `Member Sex` = 'F' THEN 'female'
        WHEN `Member Sex` = 'M' THEN 'male'
        ELSE 'other'
    END
    """
    )
)
```

#### Specify multiple transformations
```python
from spark_auto_mapper.automappers.automapper import AutoMapper
from spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A

mapper = AutoMapper(
    view="members",
    source_view="patients",
    keys=["member_id"]
).columns(
    dst1="[src1]",
    birthDate=A.date("[date_of_birth]"),
    gender=A.expression(
                """
    CASE
        WHEN `Member Sex` = 'F' THEN 'female'
        WHEN `Member Sex` = 'M' THEN 'male'
        ELSE 'other'
    END
    """
    )
)
```

#### Use variables or parameters
```python
from spark_auto_mapper.automappers.automapper import AutoMapper
from spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A

def mapping(parameters: dict):
    mapper = AutoMapper(
        view="members",
        source_view="patients",
        keys=["member_id"]
    ).columns(
        dst1=A.column(parameters["my_column_name"])
    )
```

#### Use conditional logic
```python
from spark_auto_mapper.automappers.automapper import AutoMapper
from spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A

def mapping(parameters: dict):
    mapper = AutoMapper(
        view="members",
        source_view="patients",
        keys=["member_id"]
    ).columns(
        dst1=A.column(parameters["my_column_name"])
    )

    if parameters["customer"] == "Microsoft":
        mapper = mapper.columns(
            important_customer=1,
            customer_name=parameters["customer"]
        )
    return mapper
```

#### Using nested array columns
```python
from spark_auto_mapper.automappers.automapper import AutoMapper
from spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A
mapper = AutoMapper(
    view="members",
    source_view="patients",
    keys=["member_id"]
).withColumn(
    dst2=A.list(
        [
            "address1",
            "address2"
        ]
    )
)
```

#### Using nested struct columns
```python
from spark_auto_mapper.automappers.automapper import AutoMapper
from spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A
mapper = AutoMapper(
    view="members",
    source_view="patients",
    keys=["member_id"]
).columns(
    dst2=A.complex(
        use="usual",
        family="imran"
    )
)
```

#### Using lists of structs
```python
from spark_auto_mapper.automappers.automapper import AutoMapper
from spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A
mapper = AutoMapper(
    view="members",
    source_view="patients",
    keys=["member_id"]
).columns(
    dst2=A.list(
        [
            A.complex(
                use="usual",
                family="imran"
            ),
            A.complex(
                use="usual",
                family="[last_name]"
            )
        ]
    )
)
```

## Executing the AutoMapper 
```python
spark.createDataFrame(
    [
        (1, 'Qureshi', 'Imran'),
        (2, 'Vidal', 'Michael'),
    ],
    ['member_id', 'last_name', 'first_name']
).createOrReplaceTempView("patients")

source_df: DataFrame = spark.table("patients")

df = source_df.select("member_id")
df.createOrReplaceTempView("members")

result_df: DataFrame = mapper.transform(df=df)
```

## Statically Typed Examples
To improve the auto-complete and syntax checking even more, you can define Complex types:

Define a custom data type:
```python
from spark_auto_mapper.type_definitions.automapper_defined_types import AutoMapperTextInputType
from spark_auto_mapper.helpers.automapper_value_parser import AutoMapperValueParser
from spark_auto_mapper.data_types.date import AutoMapperDateDataType
from spark_auto_mapper.data_types.list import AutoMapperList
from spark_auto_mapper_fhir.fhir_types.automapper_fhir_data_type_complex_base import AutoMapperFhirDataTypeComplexBase


class AutoMapperFhirDataTypePatient(AutoMapperFhirDataTypeComplexBase):
    # noinspection PyPep8Naming
    def __init__(self,
                 id_: AutoMapperTextInputType,
                 birthDate: AutoMapperDateDataType,
                 name: AutoMapperList,
                 gender: AutoMapperTextInputType
                 ) -> None:
        super().__init__()
        self.value = dict(
            id=AutoMapperValueParser.parse_value(id_),
            birthDate=AutoMapperValueParser.parse_value(birthDate),
            name=AutoMapperValueParser.parse_value(name),
            gender=AutoMapperValueParser.parse_value(gender)
        )

```

Now you get auto-complete and syntax checking:
```python
from spark_auto_mapper.automappers.automapper import AutoMapper
from spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A
mapper = AutoMapperFhir(
    view="members",
    source_view="patients",
    keys=["member_id"]
).withResource(
    resource=F.patient(
        id_=A.column("a.member_id"),
        birthDate=A.date(
            A.column("date_of_birth")
        ),
        name=A.list(
            F.human_name(
                use="usual",
                family=A.column("last_name")
            )
        ),
        gender="female"
    )
)
```

# Publishing a new package
1. Edit VERSION to increment the version
2. Create a new release
3. The GitHub Action should automatically kick in and publish the package
4. You can see the status in the Actions tab



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/imranq2/SparkAutoMapper",
    "name": "sparkautomapper",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3",
    "maintainer_email": "",
    "keywords": "",
    "author": "Imran Qureshi",
    "author_email": "imranq2@hotmail.com",
    "download_url": "https://files.pythonhosted.org/packages/ec/9c/9371fc2a4e70c1971d714a0de6e63809b864687be3260bc5d4f68e874290/sparkautomapper-0.1.77.tar.gz",
    "platform": "",
    "description": "# SparkAutoMapper\nFluent API to map data from one view to another in Spark.  \n\nUses native Spark functions underneath so it is just as fast as hand writing the transformations.\n\nSince this is just Python, you can use any Python editor.  Since everything is typed using Python typings, most editors will auto-complete and warn you when you do something wrong\n\n## Usage\n```shell script\npip install sparkautomapper\n```\n\n## SparkAutoMapper input and output\nYou can pass either a dataframe to SparkAutoMapper or specify the name of a Spark view to read from.\n\nYou can receive the result as a dataframe or (optionally) pass in the name of a view where you want the result.\n\n## Dynamic Typing Examples\n#### Set a column in destination to a text value (read from pass in data frame and return the result in a new dataframe)\nSet a column in destination to a text value\n```python\nfrom spark_auto_mapper.automappers.automapper import AutoMapper\n\nmapper = AutoMapper(\n    keys=[\"member_id\"]\n).columns(\n    dst1=\"hello\"\n)\n```\n\n#### Set a column in destination to a text value (read from a Spark view and put result in another Spark view)\nSet a column in destination to a text value\n```python\nfrom spark_auto_mapper.automappers.automapper import AutoMapper\n\nmapper = AutoMapper(\n    view=\"members\",\n    source_view=\"patients\",\n    keys=[\"member_id\"]\n).columns(\n    dst1=\"hello\"\n)\n```\n\n#### Set a column in destination to an int value\nSet a column in destination to a text value\n```python\nfrom spark_auto_mapper.automappers.automapper import AutoMapper\n\nmapper = AutoMapper(\n    view=\"members\",\n    source_view=\"patients\",\n    keys=[\"member_id\"]\n).columns(\n    dst1=1050\n)\n```\n\n#### Copy a column (src1) from source_view to destination view column (dst1)\n```python\nfrom spark_auto_mapper.automappers.automapper import AutoMapper\nfrom spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A\n\nmapper = AutoMapper(\n    view=\"members\",\n    source_view=\"patients\",\n    keys=[\"member_id\"]\n).columns(\n    dst1=A.column(\"src1\")\n)\n```\nOr you can use the shortcut for specifying a column (wrap column name in [])\n```python\nfrom spark_auto_mapper.automappers.automapper import AutoMapper\n\nmapper = AutoMapper(\n    view=\"members\",\n    source_view=\"patients\",\n    keys=[\"member_id\"]\n).columns(\n    dst1=\"[src1]\"\n)\n```\n\n#### Convert data type for a column (or string literal)\n```python\nfrom spark_auto_mapper.automappers.automapper import AutoMapper\nfrom spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A\n\nmapper = AutoMapper(\n    view=\"members\",\n    source_view=\"patients\",\n    keys=[\"member_id\"]\n).columns(\n    birthDate=A.date(A.column(\"date_of_birth\"))\n)\n```\n\n#### Use a Spark SQL Expression (Any valid Spark SQL expression can be used)\n```python\nfrom spark_auto_mapper.automappers.automapper import AutoMapper\nfrom spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A\n\nmapper = AutoMapper(\n    view=\"members\",\n    source_view=\"patients\",\n    keys=[\"member_id\"]\n).columns(\n    gender=A.expression(\n    \"\"\"\n    CASE\n        WHEN `Member Sex` = 'F' THEN 'female'\n        WHEN `Member Sex` = 'M' THEN 'male'\n        ELSE 'other'\n    END\n    \"\"\"\n    )\n)\n```\n\n#### Specify multiple transformations\n```python\nfrom spark_auto_mapper.automappers.automapper import AutoMapper\nfrom spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A\n\nmapper = AutoMapper(\n    view=\"members\",\n    source_view=\"patients\",\n    keys=[\"member_id\"]\n).columns(\n    dst1=\"[src1]\",\n    birthDate=A.date(\"[date_of_birth]\"),\n    gender=A.expression(\n                \"\"\"\n    CASE\n        WHEN `Member Sex` = 'F' THEN 'female'\n        WHEN `Member Sex` = 'M' THEN 'male'\n        ELSE 'other'\n    END\n    \"\"\"\n    )\n)\n```\n\n#### Use variables or parameters\n```python\nfrom spark_auto_mapper.automappers.automapper import AutoMapper\nfrom spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A\n\ndef mapping(parameters: dict):\n    mapper = AutoMapper(\n        view=\"members\",\n        source_view=\"patients\",\n        keys=[\"member_id\"]\n    ).columns(\n        dst1=A.column(parameters[\"my_column_name\"])\n    )\n```\n\n#### Use conditional logic\n```python\nfrom spark_auto_mapper.automappers.automapper import AutoMapper\nfrom spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A\n\ndef mapping(parameters: dict):\n    mapper = AutoMapper(\n        view=\"members\",\n        source_view=\"patients\",\n        keys=[\"member_id\"]\n    ).columns(\n        dst1=A.column(parameters[\"my_column_name\"])\n    )\n\n    if parameters[\"customer\"] == \"Microsoft\":\n        mapper = mapper.columns(\n            important_customer=1,\n            customer_name=parameters[\"customer\"]\n        )\n    return mapper\n```\n\n#### Using nested array columns\n```python\nfrom spark_auto_mapper.automappers.automapper import AutoMapper\nfrom spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A\nmapper = AutoMapper(\n    view=\"members\",\n    source_view=\"patients\",\n    keys=[\"member_id\"]\n).withColumn(\n    dst2=A.list(\n        [\n            \"address1\",\n            \"address2\"\n        ]\n    )\n)\n```\n\n#### Using nested struct columns\n```python\nfrom spark_auto_mapper.automappers.automapper import AutoMapper\nfrom spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A\nmapper = AutoMapper(\n    view=\"members\",\n    source_view=\"patients\",\n    keys=[\"member_id\"]\n).columns(\n    dst2=A.complex(\n        use=\"usual\",\n        family=\"imran\"\n    )\n)\n```\n\n#### Using lists of structs\n```python\nfrom spark_auto_mapper.automappers.automapper import AutoMapper\nfrom spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A\nmapper = AutoMapper(\n    view=\"members\",\n    source_view=\"patients\",\n    keys=[\"member_id\"]\n).columns(\n    dst2=A.list(\n        [\n            A.complex(\n                use=\"usual\",\n                family=\"imran\"\n            ),\n            A.complex(\n                use=\"usual\",\n                family=\"[last_name]\"\n            )\n        ]\n    )\n)\n```\n\n## Executing the AutoMapper \n```python\nspark.createDataFrame(\n    [\n        (1, 'Qureshi', 'Imran'),\n        (2, 'Vidal', 'Michael'),\n    ],\n    ['member_id', 'last_name', 'first_name']\n).createOrReplaceTempView(\"patients\")\n\nsource_df: DataFrame = spark.table(\"patients\")\n\ndf = source_df.select(\"member_id\")\ndf.createOrReplaceTempView(\"members\")\n\nresult_df: DataFrame = mapper.transform(df=df)\n```\n\n## Statically Typed Examples\nTo improve the auto-complete and syntax checking even more, you can define Complex types:\n\nDefine a custom data type:\n```python\nfrom spark_auto_mapper.type_definitions.automapper_defined_types import AutoMapperTextInputType\nfrom spark_auto_mapper.helpers.automapper_value_parser import AutoMapperValueParser\nfrom spark_auto_mapper.data_types.date import AutoMapperDateDataType\nfrom spark_auto_mapper.data_types.list import AutoMapperList\nfrom spark_auto_mapper_fhir.fhir_types.automapper_fhir_data_type_complex_base import AutoMapperFhirDataTypeComplexBase\n\n\nclass AutoMapperFhirDataTypePatient(AutoMapperFhirDataTypeComplexBase):\n    # noinspection PyPep8Naming\n    def __init__(self,\n                 id_: AutoMapperTextInputType,\n                 birthDate: AutoMapperDateDataType,\n                 name: AutoMapperList,\n                 gender: AutoMapperTextInputType\n                 ) -> None:\n        super().__init__()\n        self.value = dict(\n            id=AutoMapperValueParser.parse_value(id_),\n            birthDate=AutoMapperValueParser.parse_value(birthDate),\n            name=AutoMapperValueParser.parse_value(name),\n            gender=AutoMapperValueParser.parse_value(gender)\n        )\n\n```\n\nNow you get auto-complete and syntax checking:\n```python\nfrom spark_auto_mapper.automappers.automapper import AutoMapper\nfrom spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A\nmapper = AutoMapperFhir(\n    view=\"members\",\n    source_view=\"patients\",\n    keys=[\"member_id\"]\n).withResource(\n    resource=F.patient(\n        id_=A.column(\"a.member_id\"),\n        birthDate=A.date(\n            A.column(\"date_of_birth\")\n        ),\n        name=A.list(\n            F.human_name(\n                use=\"usual\",\n                family=A.column(\"last_name\")\n            )\n        ),\n        gender=\"female\"\n    )\n)\n```\n\n# Publishing a new package\n1. Edit VERSION to increment the version\n2. Create a new release\n3. The GitHub Action should automatically kick in and publish the package\n4. You can see the status in the Actions tab\n\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "AutoMapper for Spark",
    "version": "0.1.77",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "9004748c08d879f5826b470866680958",
                "sha256": "01fc85cadd9e4e904f55eff70989a3e2ea01a95f6bcd35b6dabc0c63c93dc9fd"
            },
            "downloads": -1,
            "filename": "sparkautomapper-0.1.77-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9004748c08d879f5826b470866680958",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3",
            "size": 93778,
            "upload_time": "2020-12-01T04:26:02",
            "upload_time_iso_8601": "2020-12-01T04:26:02.267571Z",
            "url": "https://files.pythonhosted.org/packages/d6/9a/58b2a7efd23c5b77f8a6d667872fdaa2fccdadf20cbd9307da9f77a2be30/sparkautomapper-0.1.77-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "dfa7d6ba52f3e596143b2e76db3de2b7",
                "sha256": "3589bc48c49ec017dc2e43d53cffdb81c147ad1d76a801b600479dacfb8a99ea"
            },
            "downloads": -1,
            "filename": "sparkautomapper-0.1.77.tar.gz",
            "has_sig": false,
            "md5_digest": "dfa7d6ba52f3e596143b2e76db3de2b7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3",
            "size": 32500,
            "upload_time": "2020-12-01T04:26:03",
            "upload_time_iso_8601": "2020-12-01T04:26:03.268676Z",
            "url": "https://files.pythonhosted.org/packages/ec/9c/9371fc2a4e70c1971d714a0de6e63809b864687be3260bc5d4f68e874290/sparkautomapper-0.1.77.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2020-12-01 04:26:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": null,
    "github_project": "imranq2",
    "error": "Could not fetch GitHub repository",
    "lcname": "sparkautomapper"
}
        
Elapsed time: 0.18868s