spark-dataframe-tools


Namespark-dataframe-tools JSON
Version 0.6.13 PyPI version JSON
download
home_pagehttps://github.com/jonaqp/spark_dataframe_tools/
Summaryspark_dataframe_tools
upload_time2024-08-13 20:45:31
maintainerNone
docs_urlNone
authorJonathan Quiza
requires_pythonNone
licenseNone
keywords spark dataframe
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # spark_dataframe_tools

[![Github License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Updates](https://pyup.io/repos/github/woctezuma/google-colab-transfer/shield.svg)](pyup)
[![Python 3](https://pyup.io/repos/github/woctezuma/google-colab-transfer/python-3-shield.svg)](pyup)
[![Code coverage](https://codecov.io/gh/woctezuma/google-colab-transfer/branch/master/graph/badge.svg)](codecov)

spark_dataframe_tools is a Python library that implements styles in the Dataframe

## Installation

The code is packaged for PyPI, so that the installation consists in running:

```sh
pip install spark-dataframe-tools --user --upgrade
```

## Usage
```sh
import spark_dataframe_tools 
```

```sh
from pyspark.sql.types import StructType,StructField, StringType, IntegerType
data2 = [("James","","Smith","36636","M",3000),
    ("Michael","Rose","","40288","M",4000),
    ("Robert","","Williams","42114","M",4000),
    ("Maria","Anne","Jones","39192","F",4000),
    ("Jen","Mary","Brown","","F",-1)
  ]

schema = StructType([ \
    StructField("firstname",StringType(),True), \
    StructField("middlename",StringType(),True), \
    StructField("lastname",StringType(),True), \
    StructField("id", StringType(), True), \
    StructField("gender", StringType(), True), \
    StructField("salary", IntegerType(), True) \
  ])
 
df = spark.createDataFrame(data=data2, schema=schema)
```

## Pandas

```sh
df_pandas = df.toPandas()
df_pandas.show2()
```

## Spark

```sh
# Dataframe template table
df.show2()

# Dataframe memory usage
df.size()
```



## License

[Apache License 2.0](https://www.dropbox.com/s/8t6xtgk06o3ij61/LICENSE?dl=0).

## New features v1.0

## BugFix

- choco install visualcpp-build-tools

## Reference

- Jonathan Quiza [github](https://github.com/jonaqp).
- Jonathan Quiza [RumiMLSpark](http://rumi-ml.herokuapp.com/).



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jonaqp/spark_dataframe_tools/",
    "name": "spark-dataframe-tools",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "spark, dataframe",
    "author": "Jonathan Quiza",
    "author_email": "jony327@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/07/99/bfa416b6be62bf299d537e5d87b914b34fd027b0e94479a377981c76b23f/spark_dataframe_tools-0.6.13.tar.gz",
    "platform": null,
    "description": "# spark_dataframe_tools\r\n\r\n[![Github License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\r\n[![Updates](https://pyup.io/repos/github/woctezuma/google-colab-transfer/shield.svg)](pyup)\r\n[![Python 3](https://pyup.io/repos/github/woctezuma/google-colab-transfer/python-3-shield.svg)](pyup)\r\n[![Code coverage](https://codecov.io/gh/woctezuma/google-colab-transfer/branch/master/graph/badge.svg)](codecov)\r\n\r\nspark_dataframe_tools is a Python library that implements styles in the Dataframe\r\n\r\n## Installation\r\n\r\nThe code is packaged for PyPI, so that the installation consists in running:\r\n\r\n```sh\r\npip install spark-dataframe-tools --user --upgrade\r\n```\r\n\r\n## Usage\r\n```sh\r\nimport spark_dataframe_tools \r\n```\r\n\r\n```sh\r\nfrom pyspark.sql.types import StructType,StructField, StringType, IntegerType\r\ndata2 = [(\"James\",\"\",\"Smith\",\"36636\",\"M\",3000),\r\n    (\"Michael\",\"Rose\",\"\",\"40288\",\"M\",4000),\r\n    (\"Robert\",\"\",\"Williams\",\"42114\",\"M\",4000),\r\n    (\"Maria\",\"Anne\",\"Jones\",\"39192\",\"F\",4000),\r\n    (\"Jen\",\"Mary\",\"Brown\",\"\",\"F\",-1)\r\n  ]\r\n\r\nschema = StructType([ \\\r\n    StructField(\"firstname\",StringType(),True), \\\r\n    StructField(\"middlename\",StringType(),True), \\\r\n    StructField(\"lastname\",StringType(),True), \\\r\n    StructField(\"id\", StringType(), True), \\\r\n    StructField(\"gender\", StringType(), True), \\\r\n    StructField(\"salary\", IntegerType(), True) \\\r\n  ])\r\n \r\ndf = spark.createDataFrame(data=data2, schema=schema)\r\n```\r\n\r\n## Pandas\r\n\r\n```sh\r\ndf_pandas = df.toPandas()\r\ndf_pandas.show2()\r\n```\r\n\r\n## Spark\r\n\r\n```sh\r\n# Dataframe template table\r\ndf.show2()\r\n\r\n# Dataframe memory usage\r\ndf.size()\r\n```\r\n\r\n\r\n\r\n## License\r\n\r\n[Apache License 2.0](https://www.dropbox.com/s/8t6xtgk06o3ij61/LICENSE?dl=0).\r\n\r\n## New features v1.0\r\n\r\n## BugFix\r\n\r\n- choco install visualcpp-build-tools\r\n\r\n## Reference\r\n\r\n- Jonathan Quiza [github](https://github.com/jonaqp).\r\n- Jonathan Quiza [RumiMLSpark](http://rumi-ml.herokuapp.com/).\r\n\r\n\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "spark_dataframe_tools",
    "version": "0.6.13",
    "project_urls": {
        "Download": "https://github.com/jonaqp/spark_dataframe_tools/archive/main.zip",
        "Homepage": "https://github.com/jonaqp/spark_dataframe_tools/"
    },
    "split_keywords": [
        "spark",
        " dataframe"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "351fc091f752ee0c9e36170f14aba179ed01d6f41e2dec0e0b69efffc38be7e2",
                "md5": "b59bb8c656354da77b46e5d86227228d",
                "sha256": "a167e5d7bc95b499bafe534d5e4572e6af829ccbfa7d324a8c61ba9283f4fafd"
            },
            "downloads": -1,
            "filename": "spark_dataframe_tools-0.6.13-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b59bb8c656354da77b46e5d86227228d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 12068,
            "upload_time": "2024-08-13T20:45:30",
            "upload_time_iso_8601": "2024-08-13T20:45:30.419292Z",
            "url": "https://files.pythonhosted.org/packages/35/1f/c091f752ee0c9e36170f14aba179ed01d6f41e2dec0e0b69efffc38be7e2/spark_dataframe_tools-0.6.13-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0799bfa416b6be62bf299d537e5d87b914b34fd027b0e94479a377981c76b23f",
                "md5": "8574002e7eef4cf72cc1a0d3fad3c3f5",
                "sha256": "8a3ca218f3070d2cba836a3f088b822c5eeff0f7f19feacc6206a28dbf6ad0ca"
            },
            "downloads": -1,
            "filename": "spark_dataframe_tools-0.6.13.tar.gz",
            "has_sig": false,
            "md5_digest": "8574002e7eef4cf72cc1a0d3fad3c3f5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 10813,
            "upload_time": "2024-08-13T20:45:31",
            "upload_time_iso_8601": "2024-08-13T20:45:31.400492Z",
            "url": "https://files.pythonhosted.org/packages/07/99/bfa416b6be62bf299d537e5d87b914b34fd027b0e94479a377981c76b23f/spark_dataframe_tools-0.6.13.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-13 20:45:31",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jonaqp",
    "github_project": "spark_dataframe_tools",
    "github_not_found": true,
    "lcname": "spark-dataframe-tools"
}
        
Elapsed time: 0.59581s