# pyspark-test
[![Code Style: Black](https://img.shields.io/badge/code%20style-black-black.svg)](https://github.com/ambv/black)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Unit Test](https://github.com/debugger24/pyspark-test/workflows/Unit%20Test/badge.svg?branch=main)](https://github.com/debugger24/pyspark-test/actions?query=workflow%3A%22Unit+Test%22)
[![PyPI version](https://badge.fury.io/py/pyspark-val.svg)](https://badge.fury.io/py/pyspark-val)
[![Downloads](https://pepy.tech/badge/pyspark-val)](https://pepy.tech/project/pyspark-val)
PySpark validation & testing tooling.
# Installation
```
pip install pyspark-val
```
# Usage
```py
assert_pyspark_df_equal(left_df, actual_df)
```
## Additional Arguments
* `check_dtype` : To compare the data types of spark dataframe. Default true
* `check_column_names` : To compare column names. Default false. Not required of we are checking data types.
* `check_columns_in_order` : To check the columns should be in order or not. Default to false
* `order_by` : Column names with which dataframe must be sorted before comparing. Default None.
# Example
```py
import datetime
from pyspark import SparkContext
from pyspark.sql import SparkSession
from pyspark.sql.types import *
from pyspark_test import assert_pyspark_df_equal
sc = SparkContext.getOrCreate(conf=conf)
spark_session = SparkSession(sc)
df_1 = spark_session.createDataFrame(
data=[
[datetime.date(2020, 1, 1), 'demo', 1.123, 10],
[None, None, None, None],
],
schema=StructType(
[
StructField('col_a', DateType(), True),
StructField('col_b', StringType(), True),
StructField('col_c', DoubleType(), True),
StructField('col_d', LongType(), True),
]
),
)
df_2 = spark_session.createDataFrame(
data=[
[datetime.date(2020, 1, 1), 'demo', 1.123, 10],
[None, None, None, None],
],
schema=StructType(
[
StructField('col_a', DateType(), True),
StructField('col_b', StringType(), True),
StructField('col_c', DoubleType(), True),
StructField('col_d', LongType(), True),
]
),
)
assert_pyspark_df_equal(df_1, df_2)
```
Raw data
{
"_id": null,
"home_page": "https://github.com/CarterFendley/pyspark-val",
"name": "pyspark-val",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "assert pyspark unit test testing compare validation",
"author": "Rahul Kumar, Carter Fendley",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/6e/1e/e6ba2b95f44f5f8a8956ffeb18e84d83893ebc54d25e7ae89c772e9f2cd7/pyspark_val-0.1.4.tar.gz",
"platform": null,
"description": "# pyspark-test\n\n[![Code Style: Black](https://img.shields.io/badge/code%20style-black-black.svg)](https://github.com/ambv/black)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Unit Test](https://github.com/debugger24/pyspark-test/workflows/Unit%20Test/badge.svg?branch=main)](https://github.com/debugger24/pyspark-test/actions?query=workflow%3A%22Unit+Test%22)\n[![PyPI version](https://badge.fury.io/py/pyspark-val.svg)](https://badge.fury.io/py/pyspark-val)\n[![Downloads](https://pepy.tech/badge/pyspark-val)](https://pepy.tech/project/pyspark-val)\n\nPySpark validation & testing tooling.\n\n# Installation\n\n```\npip install pyspark-val\n```\n\n# Usage\n\n```py\nassert_pyspark_df_equal(left_df, actual_df)\n```\n\n## Additional Arguments\n\n* `check_dtype` : To compare the data types of spark dataframe. Default true\n* `check_column_names` : To compare column names. Default false. Not required of we are checking data types.\n* `check_columns_in_order` : To check the columns should be in order or not. Default to false\n* `order_by` : Column names with which dataframe must be sorted before comparing. Default None.\n\n# Example\n\n```py\nimport datetime\n\nfrom pyspark import SparkContext\nfrom pyspark.sql import SparkSession\nfrom pyspark.sql.types import *\n\nfrom pyspark_test import assert_pyspark_df_equal\n\nsc = SparkContext.getOrCreate(conf=conf)\nspark_session = SparkSession(sc)\n\ndf_1 = spark_session.createDataFrame(\n data=[\n [datetime.date(2020, 1, 1), 'demo', 1.123, 10],\n [None, None, None, None],\n ],\n schema=StructType(\n [\n StructField('col_a', DateType(), True),\n StructField('col_b', StringType(), True),\n StructField('col_c', DoubleType(), True),\n StructField('col_d', LongType(), True),\n ]\n ),\n)\n\ndf_2 = spark_session.createDataFrame(\n data=[\n [datetime.date(2020, 1, 1), 'demo', 1.123, 10],\n [None, None, None, None],\n ],\n schema=StructType(\n [\n StructField('col_a', DateType(), True),\n StructField('col_b', StringType(), True),\n StructField('col_c', DoubleType(), True),\n StructField('col_d', LongType(), True),\n ]\n ),\n)\n\nassert_pyspark_df_equal(df_1, df_2)\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "PySpark validation & testing tooling",
"version": "0.1.4",
"project_urls": {
"Homepage": "https://github.com/CarterFendley/pyspark-val"
},
"split_keywords": [
"assert",
"pyspark",
"unit",
"test",
"testing",
"compare",
"validation"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "de8d07db24311ffe281afcfde3591c276a45d8f54189aa1f9f2d55f443876c47",
"md5": "669a33a348209f2fecc886998c335d5f",
"sha256": "a3940167cb7ed5a2f17de29cde6448bc254278c394c6018f9bb20ec0d8c0825e"
},
"downloads": -1,
"filename": "pyspark_val-0.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "669a33a348209f2fecc886998c335d5f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 6477,
"upload_time": "2024-01-24T22:49:58",
"upload_time_iso_8601": "2024-01-24T22:49:58.727801Z",
"url": "https://files.pythonhosted.org/packages/de/8d/07db24311ffe281afcfde3591c276a45d8f54189aa1f9f2d55f443876c47/pyspark_val-0.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6e1ee6ba2b95f44f5f8a8956ffeb18e84d83893ebc54d25e7ae89c772e9f2cd7",
"md5": "f08a641a42bc67907d0aa5d3b3bbc3d7",
"sha256": "babce0cd8d7f5ebe95cf232f60d5fce6d6c2dcf4149b671225561e17ece3558a"
},
"downloads": -1,
"filename": "pyspark_val-0.1.4.tar.gz",
"has_sig": false,
"md5_digest": "f08a641a42bc67907d0aa5d3b3bbc3d7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 5339,
"upload_time": "2024-01-24T22:50:00",
"upload_time_iso_8601": "2024-01-24T22:50:00.129962Z",
"url": "https://files.pythonhosted.org/packages/6e/1e/e6ba2b95f44f5f8a8956ffeb18e84d83893ebc54d25e7ae89c772e9f2cd7/pyspark_val-0.1.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-24 22:50:00",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "CarterFendley",
"github_project": "pyspark-val",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"tox": true,
"lcname": "pyspark-val"
}