# cupyd
[![PyPI - Version](https://img.shields.io/pypi/v/cupyd)](https://pypi.org/project/cupyd/)
![Python Version from PEP 621 TOML](https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2Fjalorub%2Fcupyd%2Frefs%2Fheads%2Fmain%2Fpyproject.toml&style=flat-square)
[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/jalorub/cupyd/ci.yaml?style=flat-square)](https://github.com/jalorub/cupyd/actions/workflows/ci.yaml?query=branch%3Amain++)
[![Coverage Status](https://coveralls.io/repos/github/jalorub/cupyd/badge.svg)](https://coveralls.io/github/jalorub/cupyd)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/cupyd?style=flat-square)](https://pypistats.org/packages/cupyd)
__
/\ \
___ __ __ _____ __ __ \_\ \
/'___\ /\ \/\ \ /\ '__`\ /\ \/\ \ /'_` \
/\ \__/ \ \ \_\ \ \ \ \L\ \ \ \ \_\ \ /\ \L\ \
\ \____\ \ \____/ \ \ ,__/ \/`____ \ \ \___,_\
\/____/ \/___/ \ \ \/ `/___/> \ \/__,_ /
\ \_\ /\___/
\/_/ \/__/
Python framework to create your own ETLs.
## Features
- Simple but powerful syntax.
- Modular approach that encourages re-using components across different ETLs.
- Parallelism out-of-the-box without the need of writing multiprocessing code.
- Very compatible:
- Runs on Unix, Windows & MacOS.
- Python >= 3.9
- Lightweight:
- No dependencies for its core version.
- [WIP] API version will require [Falcon](https://falcon.readthedocs.io/en/stable/index.html),
which is a minimalist ASGI/WSGI framework that doesn't require other packages to work.
- [WIP] The Dashboard (full) version will require Falcon and [Dash](https://dash.plotly.com/).
## Usage
In this example we will compute the factorial of 20.000 integers, using multiprocessing,
while storing the results into 2 separate lists, one for even values and another for odd values.
``` py title="basic_etl.py"
import math
from typing import Any, Iterator
from cupyd import ETL, Extractor, Transformer, Loader, Filter
class IntegerExtractor(Extractor):
def __init__(self, total_items: int):
super().__init__()
self.total_items = total_items
# generated integers will be passed to the workers in buckets of size 10
self.configuration.bucket_size = 10
def extract(self) -> Iterator[int]:
for item in range(self.total_items):
yield item
class Factorial(Transformer):
def transform(self, item: int) -> int:
return math.factorial(item)
class EvenOnly(Filter):
def filter(self, item: int) -> int | None:
return item if item & 1 else None
class OddOnly(Filter):
def filter(self, item: int) -> int | None:
return None if item & 1 else item
class ListLoader(Loader):
def __init__(self):
super().__init__()
self.configuration.run_in_main_process = True
self.items = []
def start(self):
self.items = []
def load(self, item: Any):
self.items.append(item)
if __name__ == "__main__":
# 1. Define the ETL Nodes
ext = IntegerExtractor(total_items=20_000)
factorial = Factorial()
even_only = EvenOnly()
odd_only = OddOnly()
even_ldr = ListLoader()
odd_ldr = ListLoader()
# 2. Connect the Nodes to determine the data flow. Notice the ETL branches after the
# factorial is computed
ext >> factorial >> [even_only >> even_ldr, odd_only >> odd_ldr]
# 3. Run the ETL with 8 workers (multiprocessing Processes)
etl = ETL(extractor=ext)
etl.run(workers=8, show_progress=True, monitor_performance=True)
# 4. You can access the results stored in both Loaders after the ETL is finished
even_factorials = even_ldr.items
odd_factorials = odd_ldr.items
```
For more information, go the [examples](cupyd/examples) directory
- - -
💘 (_**Project under construction**_)
Raw data
{
"_id": null,
"home_page": null,
"name": "cupyd",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "python, data, etl, parallelism, multiprocessing, framework, concurrency, threading",
"author": null,
"author_email": "Francisco Javier Alonso Rubio <fjalorub@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/f4/d6/7a03d64197dea245d9894b96685466f719a2d11ed5979e304cbdb1f219e7/cupyd-0.2.0.tar.gz",
"platform": null,
"description": "# cupyd\n\n[![PyPI - Version](https://img.shields.io/pypi/v/cupyd)](https://pypi.org/project/cupyd/)\n![Python Version from PEP 621 TOML](https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2Fjalorub%2Fcupyd%2Frefs%2Fheads%2Fmain%2Fpyproject.toml&style=flat-square)\n[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/jalorub/cupyd/ci.yaml?style=flat-square)](https://github.com/jalorub/cupyd/actions/workflows/ci.yaml?query=branch%3Amain++)\n[![Coverage Status](https://coveralls.io/repos/github/jalorub/cupyd/badge.svg)](https://coveralls.io/github/jalorub/cupyd)\n[![PyPI - Downloads](https://img.shields.io/pypi/dm/cupyd?style=flat-square)](https://pypistats.org/packages/cupyd)\n\n __ \n /\\ \\ \n ___ __ __ _____ __ __ \\_\\ \\ \n /'___\\ /\\ \\/\\ \\ /\\ '__`\\ /\\ \\/\\ \\ /'_` \\ \n /\\ \\__/ \\ \\ \\_\\ \\ \\ \\ \\L\\ \\ \\ \\ \\_\\ \\ /\\ \\L\\ \\ \n \\ \\____\\ \\ \\____/ \\ \\ ,__/ \\/`____ \\ \\ \\___,_\\\n \\/____/ \\/___/ \\ \\ \\/ `/___/> \\ \\/__,_ /\n \\ \\_\\ /\\___/ \n \\/_/ \\/__/\n\nPython framework to create your own ETLs.\n\n## Features\n\n- Simple but powerful syntax.\n- Modular approach that encourages re-using components across different ETLs.\n- Parallelism out-of-the-box without the need of writing multiprocessing code.\n- Very compatible:\n - Runs on Unix, Windows & MacOS.\n - Python >= 3.9\n- Lightweight:\n - No dependencies for its core version.\n - [WIP] API version will require [Falcon](https://falcon.readthedocs.io/en/stable/index.html),\n which is a minimalist ASGI/WSGI framework that doesn't require other packages to work.\n - [WIP] The Dashboard (full) version will require Falcon and [Dash](https://dash.plotly.com/).\n\n## Usage\n\nIn this example we will compute the factorial of 20.000 integers, using multiprocessing,\nwhile storing the results into 2 separate lists, one for even values and another for odd values.\n\n``` py title=\"basic_etl.py\"\nimport math\nfrom typing import Any, Iterator\n\nfrom cupyd import ETL, Extractor, Transformer, Loader, Filter\n\n\nclass IntegerExtractor(Extractor):\n\n def __init__(self, total_items: int):\n super().__init__()\n self.total_items = total_items\n\n # generated integers will be passed to the workers in buckets of size 10\n self.configuration.bucket_size = 10\n\n def extract(self) -> Iterator[int]:\n for item in range(self.total_items):\n yield item\n\n\nclass Factorial(Transformer):\n\n def transform(self, item: int) -> int:\n return math.factorial(item)\n\n\nclass EvenOnly(Filter):\n\n def filter(self, item: int) -> int | None:\n return item if item & 1 else None\n\n\nclass OddOnly(Filter):\n\n def filter(self, item: int) -> int | None:\n return None if item & 1 else item\n\n\nclass ListLoader(Loader):\n\n def __init__(self):\n super().__init__()\n self.configuration.run_in_main_process = True\n self.items = []\n\n def start(self):\n self.items = []\n\n def load(self, item: Any):\n self.items.append(item)\n\n\nif __name__ == \"__main__\":\n # 1. Define the ETL Nodes\n ext = IntegerExtractor(total_items=20_000)\n factorial = Factorial()\n even_only = EvenOnly()\n odd_only = OddOnly()\n even_ldr = ListLoader()\n odd_ldr = ListLoader()\n\n # 2. Connect the Nodes to determine the data flow. Notice the ETL branches after the\n # factorial is computed\n ext >> factorial >> [even_only >> even_ldr, odd_only >> odd_ldr]\n\n # 3. Run the ETL with 8 workers (multiprocessing Processes)\n etl = ETL(extractor=ext)\n etl.run(workers=8, show_progress=True, monitor_performance=True)\n\n # 4. You can access the results stored in both Loaders after the ETL is finished\n even_factorials = even_ldr.items\n odd_factorials = odd_ldr.items\n```\n\nFor more information, go the [examples](cupyd/examples) directory\n- - -\n\n\ud83d\udc98 (_**Project under construction**_)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Python framework to easily build ETLs.",
"version": "0.2.0",
"project_urls": {
"Repository": "https://github.com/jalorub/cupyd.git"
},
"split_keywords": [
"python",
" data",
" etl",
" parallelism",
" multiprocessing",
" framework",
" concurrency",
" threading"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "afa8680dc08e2d6a9beb6a1a8b4ef8e062d8bb1950ed0a9d3a488b1c097849b3",
"md5": "02d4d2022ca0519aaf265afd5028a5a2",
"sha256": "1f58f8a1327e521142c3f872af97c2b079e8196959119f160a96ef22f68c14f3"
},
"downloads": -1,
"filename": "cupyd-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "02d4d2022ca0519aaf265afd5028a5a2",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 36844,
"upload_time": "2024-10-14T11:31:49",
"upload_time_iso_8601": "2024-10-14T11:31:49.193225Z",
"url": "https://files.pythonhosted.org/packages/af/a8/680dc08e2d6a9beb6a1a8b4ef8e062d8bb1950ed0a9d3a488b1c097849b3/cupyd-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f4d67a03d64197dea245d9894b96685466f719a2d11ed5979e304cbdb1f219e7",
"md5": "be99daa5a6e4875ecd7cbea05bbea7e4",
"sha256": "12bded78210013f45d0d5572ea96f9b9babfe503e45f0c116f95bae55ca9cb58"
},
"downloads": -1,
"filename": "cupyd-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "be99daa5a6e4875ecd7cbea05bbea7e4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 26275,
"upload_time": "2024-10-14T11:31:50",
"upload_time_iso_8601": "2024-10-14T11:31:50.214734Z",
"url": "https://files.pythonhosted.org/packages/f4/d6/7a03d64197dea245d9894b96685466f719a2d11ed5979e304cbdb1f219e7/cupyd-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-14 11:31:50",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jalorub",
"github_project": "cupyd",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "cupyd"
}