pystream-pipeline

Name	pystream-pipeline JSON
Version	0.2.0 JSON
	download
home_page	https://github.com/MukhlasAdib/pystream-pipeline
Summary	Python package to create and manage fast parallelized data processing pipeline for real-time application
upload_time	2023-11-04 14:50:23
maintainer
docs_url	None
author	Mukhlas Adib
requires_python	>=3.8,<4.0
license	MIT
keywords	data-pipeline parallelization data-processing performance real-time
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # PyStream - Real Time Python Pipeline Manager

This package provides tools to build and boost up a python data pipeline for real time processing. This package is managed using [Poetry](https://python-poetry.org/ ).

For more detailed guidelines, visit this project [documentation](https://pystream-pipeline.readthedocs.io/).

## Concepts

In general, PyStream is a package, fully implemented in python, that helps you manage a data pipeline and optimize its operation performance. The main feature of PyStream is that it can build your data pipeline in asynchronous and independent multi-threaded stages model, and hopefully multi-process model in the future.

A PyStream **pipeline** is constructed by several **stages**, where each stage represents a single set of data processing operations that you define by your own. When the stages have been defined, the pipeline can be operated in two modes:

- **Serial mode:** In this mode, each stage are executed in blocking fashion. The later stages will only be executed when the previous ones have been executed, and the next data can only be processed if the previous data have been processed by the final stage. There is only one data stream that can be processed at any time.

- **Parallel mode:** In this mode, each stage live in a separate parallel thread. If a data has been finished being processed by a stage, the results will be send to the next stage. Since each stage runs in parallel, that stage can immediately take next data input if exist and process it immediately. This way, we can process multiple data at one time, thus increasing the throughput of your pipeline.

- **Mixed mode:** This a mix of serial and parallel mode. You can put a serial pipeline inside a parallel one and vice versa. Parallel pipeline can improve the pipeline throughput but it is prone to larger latency. Mixing serial and parallel pipeline can very useful to optimize the latency and throughput of your pipeline further.

Whatever the mode you choose, you only need to focus on implementation of your own data processing codes and pack them into several stages. PyStream will handle the pipeline executions including the threads and the linking of stages for you.

## Installation

You can install this package using `pip`.

```bash
pip install pystream-pipeline
```

If you want to build this package from source or develop it, we recommend you to use Poetry. First install Poetry by following the instructions in [its documentation site](https://python-poetry.org/docs/#installation). Then clone this repository and install all the dependencies. Poetry can help you do this and it will also setup a new virtual environment for you.

```bash
poetry install --with dev
```

To build the wheel file, you can run

```bash
poetry build
```

You can find the wheel file inside `dist` directory.

## Sample Usage

API of PyStream can be found in this project [documentation](https://pystream-pipeline.readthedocs.io/).

You can also access some examples:

- See [`demo.ipynb`](demo.ipynb) to get the quick start of PyStream.
- See how PyStream is used to increase the throughput of a vehicle environment mapping system in [this repository](https://github.com/MukhlasAdib/KITTI_Mapping/tree/main/app).

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/MukhlasAdib/pystream-pipeline",
    "name": "pystream-pipeline",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "",
    "keywords": "data-pipeline,parallelization,data-processing,performance,real-time",
    "author": "Mukhlas Adib",
    "author_email": "adib.rasyidy@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/92/ed/1c7e1aca15e351dfcc2bb08929141ae525372f426825a57447cb3483074d/pystream_pipeline-0.2.0.tar.gz",
    "platform": null,
    "description": "# PyStream - Real Time Python Pipeline Manager\n\nThis package provides tools to build and boost up a python data pipeline for real time processing. This package is managed using [Poetry](https://python-poetry.org/ ).\n\nFor more detailed guidelines, visit this project [documentation](https://pystream-pipeline.readthedocs.io/).\n\n## Concepts\n\nIn general, PyStream is a package, fully implemented in python, that helps you manage a data pipeline and optimize its operation performance. The main feature of PyStream is that it can build your data pipeline in asynchronous and independent multi-threaded stages model, and hopefully multi-process model in the future.\n\nA PyStream **pipeline** is constructed by several **stages**, where each stage represents a single set of data processing operations that you define by your own. When the stages have been defined, the pipeline can be operated in two modes:\n\n- **Serial mode:** In this mode, each stage are executed in blocking fashion. The later stages will only be executed when the previous ones have been executed, and the next data can only be processed if the previous data have been processed by the final stage. There is only one data stream that can be processed at any time.\n\n- **Parallel mode:** In this mode, each stage live in a separate parallel thread. If a data has been finished being processed by a stage, the results will be send to the next stage. Since each stage runs in parallel, that stage can immediately take next data input if exist and process it immediately. This way, we can process multiple data at one time, thus increasing the throughput of your pipeline.\n\n- **Mixed mode:** This a mix of serial and parallel mode. You can put a serial pipeline inside a parallel one and vice versa. Parallel pipeline can improve the pipeline throughput but it is prone to larger latency. Mixing serial and parallel pipeline can very useful to optimize the latency and throughput of your pipeline further.\n\nWhatever the mode you choose, you only need to focus on implementation of your own data processing codes and pack them into several stages. PyStream will handle the pipeline executions including the threads and the linking of stages for you.\n\n## Installation\n\nYou can install this package using `pip`.\n\n```bash\npip install pystream-pipeline\n```\n\nIf you want to build this package from source or develop it, we recommend you to use Poetry. First install Poetry by following the instructions in [its documentation site](https://python-poetry.org/docs/#installation). Then clone this repository and install all the dependencies. Poetry can help you do this and it will also setup a new virtual environment for you.\n\n```bash\npoetry install --with dev\n```\n\nTo build the wheel file, you can run\n\n```bash\npoetry build\n```\n\nYou can find the wheel file inside `dist` directory.\n\n## Sample Usage\n\nAPI of PyStream can be found in this project [documentation](https://pystream-pipeline.readthedocs.io/).\n\nYou can also access some examples:\n\n- See [`demo.ipynb`](demo.ipynb) to get the quick start of PyStream.\n- See how PyStream is used to increase the throughput of a vehicle environment mapping system in [this repository](https://github.com/MukhlasAdib/KITTI_Mapping/tree/main/app).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python package to create and manage fast parallelized data processing pipeline for real-time application",
    "version": "0.2.0",
    "project_urls": {
        "Documentation": "https://pystream-pipeline.readthedocs.io/",
        "Homepage": "https://github.com/MukhlasAdib/pystream-pipeline",
        "Repository": "https://github.com/MukhlasAdib/pystream-pipeline"
    },
    "split_keywords": [
        "data-pipeline",
        "parallelization",
        "data-processing",
        "performance",
        "real-time"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a3db901e3bd218384636d60791fd4ec908c77b64f55643864684e65619c08dde",
                "md5": "731ac455fb40559696e6c6d3a18a6bb3",
                "sha256": "618089e3f5a0e357e4698c11910352131c5f89f3f439e05eada40b0be76a0285"
            },
            "downloads": -1,
            "filename": "pystream_pipeline-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "731ac455fb40559696e6c6d3a18a6bb3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 21499,
            "upload_time": "2023-11-04T14:50:21",
            "upload_time_iso_8601": "2023-11-04T14:50:21.848322Z",
            "url": "https://files.pythonhosted.org/packages/a3/db/901e3bd218384636d60791fd4ec908c77b64f55643864684e65619c08dde/pystream_pipeline-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "92ed1c7e1aca15e351dfcc2bb08929141ae525372f426825a57447cb3483074d",
                "md5": "80e77d274f43bc3583fa45a61631ebe6",
                "sha256": "df5564e2ff2f939c7b76c9bd3edf44c458823e22324f1462d8fee23b360a8ea7"
            },
            "downloads": -1,
            "filename": "pystream_pipeline-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "80e77d274f43bc3583fa45a61631ebe6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 14206,
            "upload_time": "2023-11-04T14:50:23",
            "upload_time_iso_8601": "2023-11-04T14:50:23.277979Z",
            "url": "https://files.pythonhosted.org/packages/92/ed/1c7e1aca15e351dfcc2bb08929141ae525372f426825a57447cb3483074d/pystream_pipeline-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-04 14:50:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "MukhlasAdib",
    "github_project": "pystream-pipeline",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pystream-pipeline"
}

Mukhlas Adib