pyspark-test


Namepyspark-test JSON
Version 0.2.0 PyPI version JSON
download
home_pagehttps://github.com/debugger24/pyspark-test
SummaryCheck that left and right spark DataFrame are equal.
upload_time2021-10-31 18:25:00
maintainer
docs_urlNone
authorRahul Kumar
requires_python
licenseApache Software License (Apache 2.0)
keywords assert pyspark unit test testing compare
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pyspark-test

[![Code Style: Black](https://img.shields.io/badge/code%20style-black-black.svg)](https://github.com/ambv/black)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Unit Test](https://github.com/debugger24/pyspark-test/workflows/Unit%20Test/badge.svg?branch=main)](https://github.com/debugger24/pyspark-test/actions?query=workflow%3A%22Unit+Test%22)
[![PyPI version](https://badge.fury.io/py/pyspark-test.svg)](https://badge.fury.io/py/pyspark-test)
[![Downloads](https://pepy.tech/badge/pyspark-test)](https://pepy.tech/project/pyspark-test)

Check that left and right spark DataFrame are equal.

This function is intended to compare two spark DataFrames and output any differences. It is inspired from pandas testing module but for pyspark, and for use in unit tests. Additional parameters allow varying the strictness of the equality checks performed.

# Installation

```
pip install pyspark-test
```

# Usage

```py
assert_pyspark_df_equal(left_df, actual_df)
```

## Additional Arguments

* `check_dtype` : To compare the data types of spark dataframe. Default true
* `check_column_names` : To compare column names. Default false. Not required of we are checking data types.
* `check_columns_in_order` : To check the columns should be in order or not. Default to false
* `order_by` : Column names with which dataframe must be sorted before comparing. Default None.

# Example

```py
import datetime

from pyspark import SparkContext
from pyspark.sql import SparkSession
from pyspark.sql.types import *

from pyspark_test import assert_pyspark_df_equal

sc = SparkContext.getOrCreate(conf=conf)
spark_session = SparkSession(sc)

df_1 = spark_session.createDataFrame(
    data=[
        [datetime.date(2020, 1, 1), 'demo', 1.123, 10],
        [None, None, None, None],
    ],
    schema=StructType(
        [
            StructField('col_a', DateType(), True),
            StructField('col_b', StringType(), True),
            StructField('col_c', DoubleType(), True),
            StructField('col_d', LongType(), True),
        ]
    ),
)

df_2 = spark_session.createDataFrame(
    data=[
        [datetime.date(2020, 1, 1), 'demo', 1.123, 10],
        [None, None, None, None],
    ],
    schema=StructType(
        [
            StructField('col_a', DateType(), True),
            StructField('col_b', StringType(), True),
            StructField('col_c', DoubleType(), True),
            StructField('col_d', LongType(), True),
        ]
    ),
)

assert_pyspark_df_equal(df_1, df_2)
```



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/debugger24/pyspark-test",
    "name": "pyspark-test",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "assert pyspark unit test testing compare",
    "author": "Rahul Kumar",
    "author_email": "rahulcomp24@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/f8/a9/3ca6c0f3289da348d25693adb4f80e3d8b2389dea603f222feae4dd78e76/pyspark_test-0.2.0.tar.gz",
    "platform": "",
    "description": "# pyspark-test\n\n[![Code Style: Black](https://img.shields.io/badge/code%20style-black-black.svg)](https://github.com/ambv/black)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![Unit Test](https://github.com/debugger24/pyspark-test/workflows/Unit%20Test/badge.svg?branch=main)](https://github.com/debugger24/pyspark-test/actions?query=workflow%3A%22Unit+Test%22)\n[![PyPI version](https://badge.fury.io/py/pyspark-test.svg)](https://badge.fury.io/py/pyspark-test)\n[![Downloads](https://pepy.tech/badge/pyspark-test)](https://pepy.tech/project/pyspark-test)\n\nCheck that left and right spark DataFrame are equal.\n\nThis function is intended to compare two spark DataFrames and output any differences. It is inspired from pandas testing module but for pyspark, and for use in unit tests. Additional parameters allow varying the strictness of the equality checks performed.\n\n# Installation\n\n```\npip install pyspark-test\n```\n\n# Usage\n\n```py\nassert_pyspark_df_equal(left_df, actual_df)\n```\n\n## Additional Arguments\n\n* `check_dtype` : To compare the data types of spark dataframe. Default true\n* `check_column_names` : To compare column names. Default false. Not required of we are checking data types.\n* `check_columns_in_order` : To check the columns should be in order or not. Default to false\n* `order_by` : Column names with which dataframe must be sorted before comparing. Default None.\n\n# Example\n\n```py\nimport datetime\n\nfrom pyspark import SparkContext\nfrom pyspark.sql import SparkSession\nfrom pyspark.sql.types import *\n\nfrom pyspark_test import assert_pyspark_df_equal\n\nsc = SparkContext.getOrCreate(conf=conf)\nspark_session = SparkSession(sc)\n\ndf_1 = spark_session.createDataFrame(\n    data=[\n        [datetime.date(2020, 1, 1), 'demo', 1.123, 10],\n        [None, None, None, None],\n    ],\n    schema=StructType(\n        [\n            StructField('col_a', DateType(), True),\n            StructField('col_b', StringType(), True),\n            StructField('col_c', DoubleType(), True),\n            StructField('col_d', LongType(), True),\n        ]\n    ),\n)\n\ndf_2 = spark_session.createDataFrame(\n    data=[\n        [datetime.date(2020, 1, 1), 'demo', 1.123, 10],\n        [None, None, None, None],\n    ],\n    schema=StructType(\n        [\n            StructField('col_a', DateType(), True),\n            StructField('col_b', StringType(), True),\n            StructField('col_c', DoubleType(), True),\n            StructField('col_d', LongType(), True),\n        ]\n    ),\n)\n\nassert_pyspark_df_equal(df_1, df_2)\n```\n\n\n",
    "bugtrack_url": null,
    "license": "Apache Software License (Apache 2.0)",
    "summary": "Check that left and right spark DataFrame are equal.",
    "version": "0.2.0",
    "project_urls": {
        "Homepage": "https://github.com/debugger24/pyspark-test"
    },
    "split_keywords": [
        "assert",
        "pyspark",
        "unit",
        "test",
        "testing",
        "compare"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ec326d75e7d5171393ead86c2a6c0aba5b5cdff495286537732c3a1ad05575c1",
                "md5": "e35c397e281ab7f8908b4ecfdfe2e73d",
                "sha256": "dd4fb03c4f438f718a870a9268a459f2f8924829c767302f5515202707c97709"
            },
            "downloads": -1,
            "filename": "pyspark_test-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e35c397e281ab7f8908b4ecfdfe2e73d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 7416,
            "upload_time": "2021-10-31T18:24:59",
            "upload_time_iso_8601": "2021-10-31T18:24:59.250622Z",
            "url": "https://files.pythonhosted.org/packages/ec/32/6d75e7d5171393ead86c2a6c0aba5b5cdff495286537732c3a1ad05575c1/pyspark_test-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f8a93ca6c0f3289da348d25693adb4f80e3d8b2389dea603f222feae4dd78e76",
                "md5": "1e1975b8d80865b0396e5fb71db0a639",
                "sha256": "0d9d8d3a352a9b1c30761b0553a5771cb9dbb9a278955b3e7b0aed0ae13892d8"
            },
            "downloads": -1,
            "filename": "pyspark_test-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "1e1975b8d80865b0396e5fb71db0a639",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 7038,
            "upload_time": "2021-10-31T18:25:00",
            "upload_time_iso_8601": "2021-10-31T18:25:00.872506Z",
            "url": "https://files.pythonhosted.org/packages/f8/a9/3ca6c0f3289da348d25693adb4f80e3d8b2389dea603f222feae4dd78e76/pyspark_test-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-10-31 18:25:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "debugger24",
    "github_project": "pyspark-test",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "pyspark-test"
}
        
Elapsed time: 0.10720s