# Sparta
Library to help ETL using Pyspark.
Sparta is a simple library to help you work on ETL builds using PySpark.
## Important Sources
- <a href="https://spark.apache.org/">Apache Spark</a>
- <a href="https://pypi.org/project/smart-open/">Smart Open</a>
- <a href="https://github.com/MrPowers/chispa">Chispa</a>
## Installation
Install the latest version with ```pip install pysparta```
## Documentation
<a href="https://jcpsantos.github.io/sparta/">Sparta</a>
## Modules
### Extract
This is a module with functions for extracting and reading data.
**Example**
```python
from sparta.extract import read_with_schema
schema = 'epidemiological_week LONG, date DATE, order_for_place INT, state STRING, city STRING, city_ibge_code LONG, place_type STRING, last_available_confirmed INT'
path = '/content/sample_data/covid19-e0534be4ad17411e81305aba2d9194d9.csv'
df = read_with_schema(path, schema, {'header': 'true'}, 'csv')
```
### Transformation
This is a module with data transformation functions
**Example**
```python
from sparta.transformation import drop_duplicates
cols = ['longitude','latitude']
df = drop_duplicates(df, 'population', cols)
```
### Load
This is a module with load and write functions.
**Example**
```python
from sparta.load import create_hive_table
create_hive_table(df, "table_name", 5, "col1", "col2", "col3")
```
### Others
This is a module with several functions that can help in ETL work.
**Example**
```python
from sparta.secret import get_secret_aws
get_secret_aws('Nome_Secret', 'sa-east-1')
```
## Supported PySpark / Python versions
Sparta currently supports PySpark 3.0+ and Python 3.7+.
Raw data
{
"_id": null,
"home_page": "https://github.com/jcpsantos/sparta",
"name": "pysparta",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "spark etl data sparta",
"author": "Juan Caio",
"author_email": "juancaiops@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/55/e4/97b85f9a4f824a76456679d139a42d13a270558309b33d8994977983eb77/pysparta-0.5.5.tar.gz",
"platform": null,
"description": "# Sparta\r\n\r\nLibrary to help ETL using Pyspark.\r\n\r\nSparta is a simple library to help you work on ETL builds using PySpark.\r\n\r\n## Important Sources\r\n\r\n- <a href=\"https://spark.apache.org/\">Apache Spark</a>\r\n- <a href=\"https://pypi.org/project/smart-open/\">Smart Open</a>\r\n- <a href=\"https://github.com/MrPowers/chispa\">Chispa</a>\r\n\r\n## Installation\r\n\r\nInstall the latest version with ```pip install pysparta```\r\n\r\n## Documentation\r\n\r\n<a href=\"https://jcpsantos.github.io/sparta/\">Sparta</a>\r\n\r\n## Modules\r\n\r\n### Extract\r\n\r\nThis is a module with functions for extracting and reading data.\r\n\r\n**Example**\r\n\r\n```python\r\nfrom sparta.extract import read_with_schema\r\n\r\nschema = 'epidemiological_week LONG, date DATE, order_for_place INT, state STRING, city STRING, city_ibge_code LONG, place_type STRING, last_available_confirmed INT'\r\npath = '/content/sample_data/covid19-e0534be4ad17411e81305aba2d9194d9.csv'\r\ndf = read_with_schema(path, schema, {'header': 'true'}, 'csv')\r\n```\r\n\r\n### Transformation\r\n\r\nThis is a module with data transformation functions\r\n\r\n**Example**\r\n\r\n```python\r\nfrom sparta.transformation import drop_duplicates\r\n\r\ncols = ['longitude','latitude']\r\ndf = drop_duplicates(df, 'population', cols)\r\n```\r\n\r\n### Load\r\n\r\nThis is a module with load and write functions.\r\n\r\n**Example**\r\n\r\n```python\r\nfrom sparta.load import create_hive_table\r\n\r\ncreate_hive_table(df, \"table_name\", 5, \"col1\", \"col2\", \"col3\")\r\n```\r\n\r\n### Others\r\n\r\nThis is a module with several functions that can help in ETL work.\r\n\r\n**Example**\r\n\r\n```python\r\nfrom sparta.secret import get_secret_aws\r\n\r\nget_secret_aws('Nome_Secret', 'sa-east-1')\r\n```\r\n\r\n## Supported PySpark / Python versions\r\n\r\nSparta currently supports PySpark 3.0+ and Python 3.7+.\r\n",
"bugtrack_url": null,
"license": "GNU General Public License v2.0",
"summary": "Library to help ETL using pyspark",
"version": "0.5.5",
"project_urls": {
"Documentation": "https://jcpsantos.github.io/sparta/",
"Homepage": "https://github.com/jcpsantos/sparta",
"Source code": "https://github.com/jcpsantos/sparta"
},
"split_keywords": [
"spark",
"etl",
"data",
"sparta"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "48e5062997709c80ff790a02efcc311df955467351739ee4484f69b489b0912b",
"md5": "6d8b91a03cc91528f9ea27e72fa91204",
"sha256": "8632604e50f7a6b4e58e100733f24e0e2e701f67857102c25a015c2ef00fd81a"
},
"downloads": -1,
"filename": "pysparta-0.5.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6d8b91a03cc91528f9ea27e72fa91204",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 24447,
"upload_time": "2024-12-05T20:30:41",
"upload_time_iso_8601": "2024-12-05T20:30:41.138715Z",
"url": "https://files.pythonhosted.org/packages/48/e5/062997709c80ff790a02efcc311df955467351739ee4484f69b489b0912b/pysparta-0.5.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "55e497b85f9a4f824a76456679d139a42d13a270558309b33d8994977983eb77",
"md5": "c3b3f3a5ea43cccb73651b0f90e8c66b",
"sha256": "20cccb0d720a556028dc918dd2efe77c7fdb8224155fffeb192356163fef0b5f"
},
"downloads": -1,
"filename": "pysparta-0.5.5.tar.gz",
"has_sig": false,
"md5_digest": "c3b3f3a5ea43cccb73651b0f90e8c66b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 21484,
"upload_time": "2024-12-05T20:30:42",
"upload_time_iso_8601": "2024-12-05T20:30:42.968600Z",
"url": "https://files.pythonhosted.org/packages/55/e4/97b85f9a4f824a76456679d139a42d13a270558309b33d8994977983eb77/pysparta-0.5.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-05 20:30:42",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jcpsantos",
"github_project": "sparta",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "azure-storage-blob",
"specs": [
[
"==",
"12.12.0"
]
]
},
{
"name": "boto3",
"specs": [
[
"==",
"1.24.7"
]
]
},
{
"name": "chispa",
"specs": [
[
"==",
"0.9.2"
]
]
},
{
"name": "pyspark",
"specs": [
[
"==",
"3.2.1"
]
]
},
{
"name": "pytest",
"specs": [
[
"==",
"7.1.2"
]
]
},
{
"name": "PyYAML",
"specs": [
[
"==",
"6.0"
]
]
},
{
"name": "smart-open",
"specs": [
[
"==",
"6.0.0"
]
]
},
{
"name": "delta-spark",
"specs": [
[
"==",
"3.2.1"
]
]
}
],
"lcname": "pysparta"
}