cupyd


Namecupyd JSON
Version 0.2.0 PyPI version JSON
download
home_pageNone
SummaryPython framework to easily build ETLs.
upload_time2024-10-14 11:31:50
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseMIT
keywords python data etl parallelism multiprocessing framework concurrency threading
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # cupyd

[![PyPI - Version](https://img.shields.io/pypi/v/cupyd)](https://pypi.org/project/cupyd/)
![Python Version from PEP 621 TOML](https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2Fjalorub%2Fcupyd%2Frefs%2Fheads%2Fmain%2Fpyproject.toml&style=flat-square)
[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/jalorub/cupyd/ci.yaml?style=flat-square)](https://github.com/jalorub/cupyd/actions/workflows/ci.yaml?query=branch%3Amain++)
[![Coverage Status](https://coveralls.io/repos/github/jalorub/cupyd/badge.svg)](https://coveralls.io/github/jalorub/cupyd)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/cupyd?style=flat-square)](https://pypistats.org/packages/cupyd)

                                                      __     
                                                     /\ \    
      ___       __  __      _____       __  __       \_\ \   
     /'___\    /\ \/\ \    /\ '__`\    /\ \/\ \      /'_` \  
    /\ \__/    \ \ \_\ \   \ \ \L\ \   \ \ \_\ \    /\ \L\ \ 
    \ \____\    \ \____/    \ \ ,__/    \/`____ \   \ \___,_\
     \/____/     \/___/      \ \ \/      `/___/> \   \/__,_ /
                              \ \_\         /\___/           
                               \/_/         \/__/

Python framework to create your own ETLs.

## Features

- Simple but powerful syntax.
- Modular approach that encourages re-using components across different ETLs.
- Parallelism out-of-the-box without the need of writing multiprocessing code.
- Very compatible:
    - Runs on Unix, Windows & MacOS.
    - Python >= 3.9
- Lightweight:
    - No dependencies for its core version.
    - [WIP] API version will require [Falcon](https://falcon.readthedocs.io/en/stable/index.html),
      which is a minimalist ASGI/WSGI framework that doesn't require other packages to work.
    - [WIP] The Dashboard (full) version will require Falcon and [Dash](https://dash.plotly.com/).

## Usage

In this example we will compute the factorial of 20.000 integers, using multiprocessing,
while storing the results into 2 separate lists, one for even values and another for odd values.

``` py title="basic_etl.py"
import math
from typing import Any, Iterator

from cupyd import ETL, Extractor, Transformer, Loader, Filter


class IntegerExtractor(Extractor):

    def __init__(self, total_items: int):
        super().__init__()
        self.total_items = total_items

        # generated integers will be passed to the workers in buckets of size 10
        self.configuration.bucket_size = 10

    def extract(self) -> Iterator[int]:
        for item in range(self.total_items):
            yield item


class Factorial(Transformer):

    def transform(self, item: int) -> int:
        return math.factorial(item)


class EvenOnly(Filter):

    def filter(self, item: int) -> int | None:
        return item if item & 1 else None


class OddOnly(Filter):

    def filter(self, item: int) -> int | None:
        return None if item & 1 else item


class ListLoader(Loader):

    def __init__(self):
        super().__init__()
        self.configuration.run_in_main_process = True
        self.items = []

    def start(self):
        self.items = []

    def load(self, item: Any):
        self.items.append(item)


if __name__ == "__main__":
    # 1. Define the ETL Nodes
    ext = IntegerExtractor(total_items=20_000)
    factorial = Factorial()
    even_only = EvenOnly()
    odd_only = OddOnly()
    even_ldr = ListLoader()
    odd_ldr = ListLoader()

    # 2. Connect the Nodes to determine the data flow. Notice the ETL branches after the
    # factorial is computed
    ext >> factorial >> [even_only >> even_ldr, odd_only >> odd_ldr]

    # 3. Run the ETL with 8 workers (multiprocessing Processes)
    etl = ETL(extractor=ext)
    etl.run(workers=8, show_progress=True, monitor_performance=True)

    # 4. You can access the results stored in both Loaders after the ETL is finished
    even_factorials = even_ldr.items
    odd_factorials = odd_ldr.items
```

For more information, go the [examples](cupyd/examples) directory
- - -

💘 (_**Project under construction**_)

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "cupyd",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "python, data, etl, parallelism, multiprocessing, framework, concurrency, threading",
    "author": null,
    "author_email": "Francisco Javier Alonso Rubio <fjalorub@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/f4/d6/7a03d64197dea245d9894b96685466f719a2d11ed5979e304cbdb1f219e7/cupyd-0.2.0.tar.gz",
    "platform": null,
    "description": "# cupyd\n\n[![PyPI - Version](https://img.shields.io/pypi/v/cupyd)](https://pypi.org/project/cupyd/)\n![Python Version from PEP 621 TOML](https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2Fjalorub%2Fcupyd%2Frefs%2Fheads%2Fmain%2Fpyproject.toml&style=flat-square)\n[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/jalorub/cupyd/ci.yaml?style=flat-square)](https://github.com/jalorub/cupyd/actions/workflows/ci.yaml?query=branch%3Amain++)\n[![Coverage Status](https://coveralls.io/repos/github/jalorub/cupyd/badge.svg)](https://coveralls.io/github/jalorub/cupyd)\n[![PyPI - Downloads](https://img.shields.io/pypi/dm/cupyd?style=flat-square)](https://pypistats.org/packages/cupyd)\n\n                                                      __     \n                                                     /\\ \\    \n      ___       __  __      _____       __  __       \\_\\ \\   \n     /'___\\    /\\ \\/\\ \\    /\\ '__`\\    /\\ \\/\\ \\      /'_` \\  \n    /\\ \\__/    \\ \\ \\_\\ \\   \\ \\ \\L\\ \\   \\ \\ \\_\\ \\    /\\ \\L\\ \\ \n    \\ \\____\\    \\ \\____/    \\ \\ ,__/    \\/`____ \\   \\ \\___,_\\\n     \\/____/     \\/___/      \\ \\ \\/      `/___/> \\   \\/__,_ /\n                              \\ \\_\\         /\\___/           \n                               \\/_/         \\/__/\n\nPython framework to create your own ETLs.\n\n## Features\n\n- Simple but powerful syntax.\n- Modular approach that encourages re-using components across different ETLs.\n- Parallelism out-of-the-box without the need of writing multiprocessing code.\n- Very compatible:\n    - Runs on Unix, Windows & MacOS.\n    - Python >= 3.9\n- Lightweight:\n    - No dependencies for its core version.\n    - [WIP] API version will require [Falcon](https://falcon.readthedocs.io/en/stable/index.html),\n      which is a minimalist ASGI/WSGI framework that doesn't require other packages to work.\n    - [WIP] The Dashboard (full) version will require Falcon and [Dash](https://dash.plotly.com/).\n\n## Usage\n\nIn this example we will compute the factorial of 20.000 integers, using multiprocessing,\nwhile storing the results into 2 separate lists, one for even values and another for odd values.\n\n``` py title=\"basic_etl.py\"\nimport math\nfrom typing import Any, Iterator\n\nfrom cupyd import ETL, Extractor, Transformer, Loader, Filter\n\n\nclass IntegerExtractor(Extractor):\n\n    def __init__(self, total_items: int):\n        super().__init__()\n        self.total_items = total_items\n\n        # generated integers will be passed to the workers in buckets of size 10\n        self.configuration.bucket_size = 10\n\n    def extract(self) -> Iterator[int]:\n        for item in range(self.total_items):\n            yield item\n\n\nclass Factorial(Transformer):\n\n    def transform(self, item: int) -> int:\n        return math.factorial(item)\n\n\nclass EvenOnly(Filter):\n\n    def filter(self, item: int) -> int | None:\n        return item if item & 1 else None\n\n\nclass OddOnly(Filter):\n\n    def filter(self, item: int) -> int | None:\n        return None if item & 1 else item\n\n\nclass ListLoader(Loader):\n\n    def __init__(self):\n        super().__init__()\n        self.configuration.run_in_main_process = True\n        self.items = []\n\n    def start(self):\n        self.items = []\n\n    def load(self, item: Any):\n        self.items.append(item)\n\n\nif __name__ == \"__main__\":\n    # 1. Define the ETL Nodes\n    ext = IntegerExtractor(total_items=20_000)\n    factorial = Factorial()\n    even_only = EvenOnly()\n    odd_only = OddOnly()\n    even_ldr = ListLoader()\n    odd_ldr = ListLoader()\n\n    # 2. Connect the Nodes to determine the data flow. Notice the ETL branches after the\n    # factorial is computed\n    ext >> factorial >> [even_only >> even_ldr, odd_only >> odd_ldr]\n\n    # 3. Run the ETL with 8 workers (multiprocessing Processes)\n    etl = ETL(extractor=ext)\n    etl.run(workers=8, show_progress=True, monitor_performance=True)\n\n    # 4. You can access the results stored in both Loaders after the ETL is finished\n    even_factorials = even_ldr.items\n    odd_factorials = odd_ldr.items\n```\n\nFor more information, go the [examples](cupyd/examples) directory\n- - -\n\n\ud83d\udc98 (_**Project under construction**_)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python framework to easily build ETLs.",
    "version": "0.2.0",
    "project_urls": {
        "Repository": "https://github.com/jalorub/cupyd.git"
    },
    "split_keywords": [
        "python",
        " data",
        " etl",
        " parallelism",
        " multiprocessing",
        " framework",
        " concurrency",
        " threading"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "afa8680dc08e2d6a9beb6a1a8b4ef8e062d8bb1950ed0a9d3a488b1c097849b3",
                "md5": "02d4d2022ca0519aaf265afd5028a5a2",
                "sha256": "1f58f8a1327e521142c3f872af97c2b079e8196959119f160a96ef22f68c14f3"
            },
            "downloads": -1,
            "filename": "cupyd-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "02d4d2022ca0519aaf265afd5028a5a2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 36844,
            "upload_time": "2024-10-14T11:31:49",
            "upload_time_iso_8601": "2024-10-14T11:31:49.193225Z",
            "url": "https://files.pythonhosted.org/packages/af/a8/680dc08e2d6a9beb6a1a8b4ef8e062d8bb1950ed0a9d3a488b1c097849b3/cupyd-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f4d67a03d64197dea245d9894b96685466f719a2d11ed5979e304cbdb1f219e7",
                "md5": "be99daa5a6e4875ecd7cbea05bbea7e4",
                "sha256": "12bded78210013f45d0d5572ea96f9b9babfe503e45f0c116f95bae55ca9cb58"
            },
            "downloads": -1,
            "filename": "cupyd-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "be99daa5a6e4875ecd7cbea05bbea7e4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 26275,
            "upload_time": "2024-10-14T11:31:50",
            "upload_time_iso_8601": "2024-10-14T11:31:50.214734Z",
            "url": "https://files.pythonhosted.org/packages/f4/d6/7a03d64197dea245d9894b96685466f719a2d11ed5979e304cbdb1f219e7/cupyd-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-14 11:31:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jalorub",
    "github_project": "cupyd",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "cupyd"
}
        
Elapsed time: 0.37290s