# spark_dataframe_tools
[![Github License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Updates](https://pyup.io/repos/github/woctezuma/google-colab-transfer/shield.svg)](pyup)
[![Python 3](https://pyup.io/repos/github/woctezuma/google-colab-transfer/python-3-shield.svg)](pyup)
[![Code coverage](https://codecov.io/gh/woctezuma/google-colab-transfer/branch/master/graph/badge.svg)](codecov)
spark_dataframe_tools is a Python library that implements styles in the Dataframe
## Installation
The code is packaged for PyPI, so that the installation consists in running:
```sh
pip install spark-dataframe-tools --user --upgrade
```
## Usage
```sh
import spark_dataframe_tools
```
```sh
from pyspark.sql.types import StructType,StructField, StringType, IntegerType
data2 = [("James","","Smith","36636","M",3000),
("Michael","Rose","","40288","M",4000),
("Robert","","Williams","42114","M",4000),
("Maria","Anne","Jones","39192","F",4000),
("Jen","Mary","Brown","","F",-1)
]
schema = StructType([ \
StructField("firstname",StringType(),True), \
StructField("middlename",StringType(),True), \
StructField("lastname",StringType(),True), \
StructField("id", StringType(), True), \
StructField("gender", StringType(), True), \
StructField("salary", IntegerType(), True) \
])
df = spark.createDataFrame(data=data2, schema=schema)
```
## Pandas
```sh
df_pandas = df.toPandas()
df_pandas.show2()
```
## Spark
```sh
# Dataframe template table
df.show2()
# Dataframe memory usage
df.size()
```
## License
[Apache License 2.0](https://www.dropbox.com/s/8t6xtgk06o3ij61/LICENSE?dl=0).
## New features v1.0
## BugFix
- choco install visualcpp-build-tools
## Reference
- Jonathan Quiza [github](https://github.com/jonaqp).
- Jonathan Quiza [RumiMLSpark](http://rumi-ml.herokuapp.com/).
Raw data
{
"_id": null,
"home_page": "https://github.com/jonaqp/spark_dataframe_tools/",
"name": "spark-dataframe-tools",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "spark, dataframe",
"author": "Jonathan Quiza",
"author_email": "jony327@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/60/ec/9377115fcc66470a9dc3df54c0e67633f0e822a5cc552f64fc53c09ceaeb/spark_dataframe_tools-0.6.5.tar.gz",
"platform": null,
"description": "# spark_dataframe_tools\r\n\r\n[![Github License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\r\n[![Updates](https://pyup.io/repos/github/woctezuma/google-colab-transfer/shield.svg)](pyup)\r\n[![Python 3](https://pyup.io/repos/github/woctezuma/google-colab-transfer/python-3-shield.svg)](pyup)\r\n[![Code coverage](https://codecov.io/gh/woctezuma/google-colab-transfer/branch/master/graph/badge.svg)](codecov)\r\n\r\nspark_dataframe_tools is a Python library that implements styles in the Dataframe\r\n\r\n## Installation\r\n\r\nThe code is packaged for PyPI, so that the installation consists in running:\r\n\r\n```sh\r\npip install spark-dataframe-tools --user --upgrade\r\n```\r\n\r\n## Usage\r\n```sh\r\nimport spark_dataframe_tools \r\n```\r\n\r\n```sh\r\nfrom pyspark.sql.types import StructType,StructField, StringType, IntegerType\r\ndata2 = [(\"James\",\"\",\"Smith\",\"36636\",\"M\",3000),\r\n (\"Michael\",\"Rose\",\"\",\"40288\",\"M\",4000),\r\n (\"Robert\",\"\",\"Williams\",\"42114\",\"M\",4000),\r\n (\"Maria\",\"Anne\",\"Jones\",\"39192\",\"F\",4000),\r\n (\"Jen\",\"Mary\",\"Brown\",\"\",\"F\",-1)\r\n ]\r\n\r\nschema = StructType([ \\\r\n StructField(\"firstname\",StringType(),True), \\\r\n StructField(\"middlename\",StringType(),True), \\\r\n StructField(\"lastname\",StringType(),True), \\\r\n StructField(\"id\", StringType(), True), \\\r\n StructField(\"gender\", StringType(), True), \\\r\n StructField(\"salary\", IntegerType(), True) \\\r\n ])\r\n \r\ndf = spark.createDataFrame(data=data2, schema=schema)\r\n```\r\n\r\n## Pandas\r\n\r\n```sh\r\ndf_pandas = df.toPandas()\r\ndf_pandas.show2()\r\n```\r\n\r\n## Spark\r\n\r\n```sh\r\n# Dataframe template table\r\ndf.show2()\r\n\r\n# Dataframe memory usage\r\ndf.size()\r\n```\r\n\r\n\r\n\r\n## License\r\n\r\n[Apache License 2.0](https://www.dropbox.com/s/8t6xtgk06o3ij61/LICENSE?dl=0).\r\n\r\n## New features v1.0\r\n\r\n## BugFix\r\n\r\n- choco install visualcpp-build-tools\r\n\r\n## Reference\r\n\r\n- Jonathan Quiza [github](https://github.com/jonaqp).\r\n- Jonathan Quiza [RumiMLSpark](http://rumi-ml.herokuapp.com/).\r\n\r\n\r\n",
"bugtrack_url": null,
"license": null,
"summary": "spark_dataframe_tools",
"version": "0.6.5",
"project_urls": {
"Download": "https://github.com/jonaqp/spark_dataframe_tools/archive/main.zip",
"Homepage": "https://github.com/jonaqp/spark_dataframe_tools/"
},
"split_keywords": [
"spark",
" dataframe"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "fcc0e7b1866e8de3bc8cb8a888fe21a1238d996a674f0507a04e80b14d9cca43",
"md5": "96ac3957cfea38748d97ab0db803a1ca",
"sha256": "e2fa8673c894cf929e33d5e07ea3fe4984b21e55cabb27c5f8c2f24dd47fb7e0"
},
"downloads": -1,
"filename": "spark_dataframe_tools-0.6.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "96ac3957cfea38748d97ab0db803a1ca",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 11302,
"upload_time": "2024-04-12T08:29:43",
"upload_time_iso_8601": "2024-04-12T08:29:43.458671Z",
"url": "https://files.pythonhosted.org/packages/fc/c0/e7b1866e8de3bc8cb8a888fe21a1238d996a674f0507a04e80b14d9cca43/spark_dataframe_tools-0.6.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "60ec9377115fcc66470a9dc3df54c0e67633f0e822a5cc552f64fc53c09ceaeb",
"md5": "90e935416c2b0e00c19d81bfa3370a14",
"sha256": "32823160d9f9bbf8136fcd2ea1441429169f0d9ea5e1e2976cc5108e723acd0d"
},
"downloads": -1,
"filename": "spark_dataframe_tools-0.6.5.tar.gz",
"has_sig": false,
"md5_digest": "90e935416c2b0e00c19d81bfa3370a14",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 10119,
"upload_time": "2024-04-12T08:29:45",
"upload_time_iso_8601": "2024-04-12T08:29:45.167258Z",
"url": "https://files.pythonhosted.org/packages/60/ec/9377115fcc66470a9dc3df54c0e67633f0e822a5cc552f64fc53c09ceaeb/spark_dataframe_tools-0.6.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-12 08:29:45",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jonaqp",
"github_project": "spark_dataframe_tools",
"github_not_found": true,
"lcname": "spark-dataframe-tools"
}