procpipe


Nameprocpipe JSON
Version 2020.10 PyPI version JSON
download
home_pagehttps://github.com/cdaudt/pipeline
SummarySimple data processing pipelines
upload_time2020-10-17 21:39:03
maintainer
docs_urlNone
authorChristian Daudt
requires_python>=3.6
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ![Python package](https://github.com/cdaudt/pipeline/workflows/Python%20package/badge.svg)

# Simple data processing pipelines

Create simple data pipelines with sink/source modules that can 
process or drop elements. Each pipeline step receives a dictionary
of metadata+data for the element, and it can add/remove fields
to the element, or terminate processing of the element.

An example pipeline would be a generator (the **source**) which reads image files from a directory, followed by a pipe that resizes each image, followed by a pipe that saves each image. 
# Stages

Each stage can contain a **source**, a **sink**, or both. Sources generate elements, while sinks process and optionally drop elements from the pipeline.

## Source Stage

A source stage is defined by creating a subclass of 'Pipeline' class with a 'source' function at a minimum, as the example below.

```python
class ArraySource(pipeline.Pipeline):
    def __init__(self, sink, arr):
        self.arr = arr
        super(ArraySource, self).__init__(sink)

    def source(self):
        for i in range(len(self.arr)):

            element = {
                "word_id": i,
                "word": self.arr[i]
            }

            yield (element)
```
## Sink Stage
A sink stage is define by creating a subclass of 'Pipeline' class with a sink function at a minimum, as in the example below
```python
class DropSmallWord(pipeline.Pipeline):
    def __init__(self, sink, min):
        self.min = min
        super(DropSmallWord, self).__init__(sink)

    def sink(self, element):
        if len(element['word']) < self.min:
            return None
        else:
            return element

```

# Elements

Elements are the units of data passed through the processing pipeline. An element is a dictionary that can contain any number of fields. Both data and meta-data about the data unit can be contained in the element.

# Creating the Pipeline
In order to create a pipeline, the stages are created and linked to each other, starting from the final stage and working back to the source, as follows:
```python
    pw = PrintWord(None) # Save image
    ds = DropSmallWord(pw, 5)
    a = ArraySource(ds, words)
```
As can be seen, the final stage is initiated with ```None``` as sink, while all other stages receive their subsequent stage as sink

# Examples
Look in the examples subdirectory for these examples
   * feeder.py: Feeds an array of words into a filter stage that drops small words, followed by a stage that prints the remaining words
   * resizer.py: Reads image files from the command-line, passes them through a resizer stage, followed by an image-save stage.





            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/cdaudt/pipeline",
    "name": "procpipe",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "",
    "author": "Christian Daudt",
    "author_email": "csd@fixthebug.org",
    "download_url": "https://files.pythonhosted.org/packages/f8/6a/3c35dab82830eafefedd2739957de2c140ff1b2a3c6709044b9c35bc4d49/procpipe-2020.10.tar.gz",
    "platform": "",
    "description": "![Python package](https://github.com/cdaudt/pipeline/workflows/Python%20package/badge.svg)\n\n# Simple data processing pipelines\n\nCreate simple data pipelines with sink/source modules that can \nprocess or drop elements. Each pipeline step receives a dictionary\nof metadata+data for the element, and it can add/remove fields\nto the element, or terminate processing of the element.\n\nAn example pipeline would be a generator (the **source**) which reads image files from a directory, followed by a pipe that resizes each image, followed by a pipe that saves each image. \n# Stages\n\nEach stage can contain a **source**, a **sink**, or both. Sources generate elements, while sinks process and optionally drop elements from the pipeline.\n\n## Source Stage\n\nA source stage is defined by creating a subclass of 'Pipeline' class with a 'source' function at a minimum, as the example below.\n\n```python\nclass ArraySource(pipeline.Pipeline):\n    def __init__(self, sink, arr):\n        self.arr = arr\n        super(ArraySource, self).__init__(sink)\n\n    def source(self):\n        for i in range(len(self.arr)):\n\n            element = {\n                \"word_id\": i,\n                \"word\": self.arr[i]\n            }\n\n            yield (element)\n```\n## Sink Stage\nA sink stage is define by creating a subclass of 'Pipeline' class with a sink function at a minimum, as in the example below\n```python\nclass DropSmallWord(pipeline.Pipeline):\n    def __init__(self, sink, min):\n        self.min = min\n        super(DropSmallWord, self).__init__(sink)\n\n    def sink(self, element):\n        if len(element['word']) < self.min:\n            return None\n        else:\n            return element\n\n```\n\n# Elements\n\nElements are the units of data passed through the processing pipeline. An element is a dictionary that can contain any number of fields. Both data and meta-data about the data unit can be contained in the element.\n\n# Creating the Pipeline\nIn order to create a pipeline, the stages are created and linked to each other, starting from the final stage and working back to the source, as follows:\n```python\n    pw = PrintWord(None) # Save image\n    ds = DropSmallWord(pw, 5)\n    a = ArraySource(ds, words)\n```\nAs can be seen, the final stage is initiated with ```None``` as sink, while all other stages receive their subsequent stage as sink\n\n# Examples\nLook in the examples subdirectory for these examples\n   * feeder.py: Feeds an array of words into a filter stage that drops small words, followed by a stage that prints the remaining words\n   * resizer.py: Reads image files from the command-line, passes them through a resizer stage, followed by an image-save stage.\n\n\n\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Simple data processing pipelines",
    "version": "2020.10",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "d0c0f283cfa51cdb3ed9db98c2703c63",
                "sha256": "761d2e12332f4293365d8ac383fe41a34794eb712e81dda1071dcc7dd8d3f3ca"
            },
            "downloads": -1,
            "filename": "procpipe-2020.10-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d0c0f283cfa51cdb3ed9db98c2703c63",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 3721,
            "upload_time": "2020-10-17T21:39:01",
            "upload_time_iso_8601": "2020-10-17T21:39:01.007118Z",
            "url": "https://files.pythonhosted.org/packages/d6/44/f6193d15fcb01d10bae9cc45a1bc4df7ee5aec65ca13a3bcae5587823e7e/procpipe-2020.10-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "cc419c3dbbb6392db9b1f27e82345fb4",
                "sha256": "919b34a33b421bbcdac196fb3049d40564ca9a4d9ec4c9be299ad1d8d3939c2d"
            },
            "downloads": -1,
            "filename": "procpipe-2020.10.tar.gz",
            "has_sig": false,
            "md5_digest": "cc419c3dbbb6392db9b1f27e82345fb4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 2825,
            "upload_time": "2020-10-17T21:39:03",
            "upload_time_iso_8601": "2020-10-17T21:39:03.293462Z",
            "url": "https://files.pythonhosted.org/packages/f8/6a/3c35dab82830eafefedd2739957de2c140ff1b2a3c6709044b9c35bc4d49/procpipe-2020.10.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2020-10-17 21:39:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": null,
    "github_project": "cdaudt",
    "error": "Could not fetch GitHub repository",
    "lcname": "procpipe"
}
        
Elapsed time: 0.17876s