pysparta


Namepysparta JSON
Version 0.5.6 PyPI version JSON
download
home_pagehttps://github.com/jcpsantos/sparta
SummaryLibrary to help ETL using pyspark
upload_time2025-01-06 19:34:53
maintainerNone
docs_urlNone
authorJuan Caio
requires_python>=3.7
licenseGNU General Public License v2.0
keywords spark etl data sparta
VCS
bugtrack_url
requirements azure-storage-blob boto3 chispa pyspark pytest PyYAML smart-open delta-spark
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Sparta

Library to help ETL using Pyspark.

Sparta is a simple library to help you work on ETL builds using PySpark.

## Important Sources

- <a href="https://spark.apache.org/">Apache Spark</a>
- <a href="https://pypi.org/project/smart-open/">Smart Open</a>
- <a href="https://github.com/MrPowers/chispa">Chispa</a>

## Installation

Install the latest version with ```pip install pysparta```

## Documentation

<a href="https://jcpsantos.github.io/sparta/">Sparta</a>

## Modules

### Extract

This is a module with functions for extracting and reading data.

**Example**

```python
from sparta.extract import read_with_schema

schema = 'epidemiological_week LONG, date DATE, order_for_place INT, state STRING, city STRING, city_ibge_code LONG, place_type STRING, last_available_confirmed INT'
path = '/content/sample_data/covid19-e0534be4ad17411e81305aba2d9194d9.csv'
df = read_with_schema(path, schema, {'header': 'true'}, 'csv')
```

### Transformation

This is a module with data transformation functions

**Example**

```python
from sparta.transformation import drop_duplicates

cols = ['longitude','latitude']
df = drop_duplicates(df, 'population', cols)
```

### Load

This is a module with load and write functions.

**Example**

```python
from sparta.load import create_hive_table

create_hive_table(df, "table_name", 5, "col1", "col2", "col3")
```

### Others

This is a module with several functions that can help in ETL work.

**Example**

```python
from sparta.secret import get_secret_aws

get_secret_aws('Nome_Secret', 'sa-east-1')
```

## Supported PySpark / Python versions

Sparta currently supports PySpark 3.0+ and Python 3.7+.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jcpsantos/sparta",
    "name": "pysparta",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "spark etl data sparta",
    "author": "Juan Caio",
    "author_email": "juancaiops@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/10/73/db5eefadd41ee7713aa7aef2f4cd34e1ec371b12aaa5a9375874c5df7aac/pysparta-0.5.6.tar.gz",
    "platform": null,
    "description": "# Sparta\r\n\r\nLibrary to help ETL using Pyspark.\r\n\r\nSparta is a simple library to help you work on ETL builds using PySpark.\r\n\r\n## Important Sources\r\n\r\n- <a href=\"https://spark.apache.org/\">Apache Spark</a>\r\n- <a href=\"https://pypi.org/project/smart-open/\">Smart Open</a>\r\n- <a href=\"https://github.com/MrPowers/chispa\">Chispa</a>\r\n\r\n## Installation\r\n\r\nInstall the latest version with ```pip install pysparta```\r\n\r\n## Documentation\r\n\r\n<a href=\"https://jcpsantos.github.io/sparta/\">Sparta</a>\r\n\r\n## Modules\r\n\r\n### Extract\r\n\r\nThis is a module with functions for extracting and reading data.\r\n\r\n**Example**\r\n\r\n```python\r\nfrom sparta.extract import read_with_schema\r\n\r\nschema = 'epidemiological_week LONG, date DATE, order_for_place INT, state STRING, city STRING, city_ibge_code LONG, place_type STRING, last_available_confirmed INT'\r\npath = '/content/sample_data/covid19-e0534be4ad17411e81305aba2d9194d9.csv'\r\ndf = read_with_schema(path, schema, {'header': 'true'}, 'csv')\r\n```\r\n\r\n### Transformation\r\n\r\nThis is a module with data transformation functions\r\n\r\n**Example**\r\n\r\n```python\r\nfrom sparta.transformation import drop_duplicates\r\n\r\ncols = ['longitude','latitude']\r\ndf = drop_duplicates(df, 'population', cols)\r\n```\r\n\r\n### Load\r\n\r\nThis is a module with load and write functions.\r\n\r\n**Example**\r\n\r\n```python\r\nfrom sparta.load import create_hive_table\r\n\r\ncreate_hive_table(df, \"table_name\", 5, \"col1\", \"col2\", \"col3\")\r\n```\r\n\r\n### Others\r\n\r\nThis is a module with several functions that can help in ETL work.\r\n\r\n**Example**\r\n\r\n```python\r\nfrom sparta.secret import get_secret_aws\r\n\r\nget_secret_aws('Nome_Secret', 'sa-east-1')\r\n```\r\n\r\n## Supported PySpark / Python versions\r\n\r\nSparta currently supports PySpark 3.0+ and Python 3.7+.\r\n",
    "bugtrack_url": null,
    "license": "GNU General Public License v2.0",
    "summary": "Library to help ETL using pyspark",
    "version": "0.5.6",
    "project_urls": {
        "Documentation": "https://jcpsantos.github.io/sparta/",
        "Homepage": "https://github.com/jcpsantos/sparta",
        "Source code": "https://github.com/jcpsantos/sparta"
    },
    "split_keywords": [
        "spark",
        "etl",
        "data",
        "sparta"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4abf1ee2defecf34d6e9195410fe67073fe99ae2eaea738064e2a82a3a908dbd",
                "md5": "5c06767948c9ee67bbc40a279459d091",
                "sha256": "8e8d0106bfed06873dcbd405eb43b323b21a9ddc50f21598fde0199ed2d0b171"
            },
            "downloads": -1,
            "filename": "pysparta-0.5.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5c06767948c9ee67bbc40a279459d091",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 24855,
            "upload_time": "2025-01-06T19:34:52",
            "upload_time_iso_8601": "2025-01-06T19:34:52.248140Z",
            "url": "https://files.pythonhosted.org/packages/4a/bf/1ee2defecf34d6e9195410fe67073fe99ae2eaea738064e2a82a3a908dbd/pysparta-0.5.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1073db5eefadd41ee7713aa7aef2f4cd34e1ec371b12aaa5a9375874c5df7aac",
                "md5": "679e14a3665ad0e026351f33a870ad8c",
                "sha256": "41dbbc9b00cc9d7fda7007f200db46a1cc1f4090b2a6d4b9682d2266c296740f"
            },
            "downloads": -1,
            "filename": "pysparta-0.5.6.tar.gz",
            "has_sig": false,
            "md5_digest": "679e14a3665ad0e026351f33a870ad8c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 21826,
            "upload_time": "2025-01-06T19:34:53",
            "upload_time_iso_8601": "2025-01-06T19:34:53.943077Z",
            "url": "https://files.pythonhosted.org/packages/10/73/db5eefadd41ee7713aa7aef2f4cd34e1ec371b12aaa5a9375874c5df7aac/pysparta-0.5.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-06 19:34:53",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jcpsantos",
    "github_project": "sparta",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "azure-storage-blob",
            "specs": [
                [
                    "==",
                    "12.12.0"
                ]
            ]
        },
        {
            "name": "boto3",
            "specs": [
                [
                    "==",
                    "1.24.7"
                ]
            ]
        },
        {
            "name": "chispa",
            "specs": [
                [
                    "==",
                    "0.9.2"
                ]
            ]
        },
        {
            "name": "pyspark",
            "specs": [
                [
                    "==",
                    "3.2.1"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    "==",
                    "7.1.2"
                ]
            ]
        },
        {
            "name": "PyYAML",
            "specs": [
                [
                    "==",
                    "6.0"
                ]
            ]
        },
        {
            "name": "smart-open",
            "specs": [
                [
                    "==",
                    "6.0.0"
                ]
            ]
        },
        {
            "name": "delta-spark",
            "specs": [
                [
                    "==",
                    "3.2.1"
                ]
            ]
        }
    ],
    "lcname": "pysparta"
}
        
Elapsed time: 3.52517s