pyspark-val


Namepyspark-val JSON
Version 0.1.4 PyPI version JSON
download
home_pagehttps://github.com/CarterFendley/pyspark-val
SummaryPySpark validation & testing tooling
upload_time2024-01-24 22:50:00
maintainer
docs_urlNone
authorRahul Kumar, Carter Fendley
requires_python
licenseMIT
keywords assert pyspark unit test testing compare validation
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pyspark-test

[![Code Style: Black](https://img.shields.io/badge/code%20style-black-black.svg)](https://github.com/ambv/black)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Unit Test](https://github.com/debugger24/pyspark-test/workflows/Unit%20Test/badge.svg?branch=main)](https://github.com/debugger24/pyspark-test/actions?query=workflow%3A%22Unit+Test%22)
[![PyPI version](https://badge.fury.io/py/pyspark-val.svg)](https://badge.fury.io/py/pyspark-val)
[![Downloads](https://pepy.tech/badge/pyspark-val)](https://pepy.tech/project/pyspark-val)

PySpark validation & testing tooling.

# Installation

```
pip install pyspark-val
```

# Usage

```py
assert_pyspark_df_equal(left_df, actual_df)
```

## Additional Arguments

* `check_dtype` : To compare the data types of spark dataframe. Default true
* `check_column_names` : To compare column names. Default false. Not required of we are checking data types.
* `check_columns_in_order` : To check the columns should be in order or not. Default to false
* `order_by` : Column names with which dataframe must be sorted before comparing. Default None.

# Example

```py
import datetime

from pyspark import SparkContext
from pyspark.sql import SparkSession
from pyspark.sql.types import *

from pyspark_test import assert_pyspark_df_equal

sc = SparkContext.getOrCreate(conf=conf)
spark_session = SparkSession(sc)

df_1 = spark_session.createDataFrame(
    data=[
        [datetime.date(2020, 1, 1), 'demo', 1.123, 10],
        [None, None, None, None],
    ],
    schema=StructType(
        [
            StructField('col_a', DateType(), True),
            StructField('col_b', StringType(), True),
            StructField('col_c', DoubleType(), True),
            StructField('col_d', LongType(), True),
        ]
    ),
)

df_2 = spark_session.createDataFrame(
    data=[
        [datetime.date(2020, 1, 1), 'demo', 1.123, 10],
        [None, None, None, None],
    ],
    schema=StructType(
        [
            StructField('col_a', DateType(), True),
            StructField('col_b', StringType(), True),
            StructField('col_c', DoubleType(), True),
            StructField('col_d', LongType(), True),
        ]
    ),
)

assert_pyspark_df_equal(df_1, df_2)
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/CarterFendley/pyspark-val",
    "name": "pyspark-val",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "assert pyspark unit test testing compare validation",
    "author": "Rahul Kumar, Carter Fendley",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/6e/1e/e6ba2b95f44f5f8a8956ffeb18e84d83893ebc54d25e7ae89c772e9f2cd7/pyspark_val-0.1.4.tar.gz",
    "platform": null,
    "description": "# pyspark-test\n\n[![Code Style: Black](https://img.shields.io/badge/code%20style-black-black.svg)](https://github.com/ambv/black)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Unit Test](https://github.com/debugger24/pyspark-test/workflows/Unit%20Test/badge.svg?branch=main)](https://github.com/debugger24/pyspark-test/actions?query=workflow%3A%22Unit+Test%22)\n[![PyPI version](https://badge.fury.io/py/pyspark-val.svg)](https://badge.fury.io/py/pyspark-val)\n[![Downloads](https://pepy.tech/badge/pyspark-val)](https://pepy.tech/project/pyspark-val)\n\nPySpark validation & testing tooling.\n\n# Installation\n\n```\npip install pyspark-val\n```\n\n# Usage\n\n```py\nassert_pyspark_df_equal(left_df, actual_df)\n```\n\n## Additional Arguments\n\n* `check_dtype` : To compare the data types of spark dataframe. Default true\n* `check_column_names` : To compare column names. Default false. Not required of we are checking data types.\n* `check_columns_in_order` : To check the columns should be in order or not. Default to false\n* `order_by` : Column names with which dataframe must be sorted before comparing. Default None.\n\n# Example\n\n```py\nimport datetime\n\nfrom pyspark import SparkContext\nfrom pyspark.sql import SparkSession\nfrom pyspark.sql.types import *\n\nfrom pyspark_test import assert_pyspark_df_equal\n\nsc = SparkContext.getOrCreate(conf=conf)\nspark_session = SparkSession(sc)\n\ndf_1 = spark_session.createDataFrame(\n    data=[\n        [datetime.date(2020, 1, 1), 'demo', 1.123, 10],\n        [None, None, None, None],\n    ],\n    schema=StructType(\n        [\n            StructField('col_a', DateType(), True),\n            StructField('col_b', StringType(), True),\n            StructField('col_c', DoubleType(), True),\n            StructField('col_d', LongType(), True),\n        ]\n    ),\n)\n\ndf_2 = spark_session.createDataFrame(\n    data=[\n        [datetime.date(2020, 1, 1), 'demo', 1.123, 10],\n        [None, None, None, None],\n    ],\n    schema=StructType(\n        [\n            StructField('col_a', DateType(), True),\n            StructField('col_b', StringType(), True),\n            StructField('col_c', DoubleType(), True),\n            StructField('col_d', LongType(), True),\n        ]\n    ),\n)\n\nassert_pyspark_df_equal(df_1, df_2)\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "PySpark validation & testing tooling",
    "version": "0.1.4",
    "project_urls": {
        "Homepage": "https://github.com/CarterFendley/pyspark-val"
    },
    "split_keywords": [
        "assert",
        "pyspark",
        "unit",
        "test",
        "testing",
        "compare",
        "validation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "de8d07db24311ffe281afcfde3591c276a45d8f54189aa1f9f2d55f443876c47",
                "md5": "669a33a348209f2fecc886998c335d5f",
                "sha256": "a3940167cb7ed5a2f17de29cde6448bc254278c394c6018f9bb20ec0d8c0825e"
            },
            "downloads": -1,
            "filename": "pyspark_val-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "669a33a348209f2fecc886998c335d5f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 6477,
            "upload_time": "2024-01-24T22:49:58",
            "upload_time_iso_8601": "2024-01-24T22:49:58.727801Z",
            "url": "https://files.pythonhosted.org/packages/de/8d/07db24311ffe281afcfde3591c276a45d8f54189aa1f9f2d55f443876c47/pyspark_val-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6e1ee6ba2b95f44f5f8a8956ffeb18e84d83893ebc54d25e7ae89c772e9f2cd7",
                "md5": "f08a641a42bc67907d0aa5d3b3bbc3d7",
                "sha256": "babce0cd8d7f5ebe95cf232f60d5fce6d6c2dcf4149b671225561e17ece3558a"
            },
            "downloads": -1,
            "filename": "pyspark_val-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "f08a641a42bc67907d0aa5d3b3bbc3d7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 5339,
            "upload_time": "2024-01-24T22:50:00",
            "upload_time_iso_8601": "2024-01-24T22:50:00.129962Z",
            "url": "https://files.pythonhosted.org/packages/6e/1e/e6ba2b95f44f5f8a8956ffeb18e84d83893ebc54d25e7ae89c772e9f2cd7/pyspark_val-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-24 22:50:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "CarterFendley",
    "github_project": "pyspark-val",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "pyspark-val"
}
        
Elapsed time: 2.56895s