jpl.pipedreams


Namejpl.pipedreams JSON
Version 1.0.5 PyPI version JSON
download
home_pagehttps://github.com/EDRN/jpl.pipedreams
SummaryPipe Dreams: API for publication of scientific data
upload_time2024-02-28 21:26:51
maintainerSean Kelly
docs_urlNone
authorAsitang Mishra
requires_python<3.10,>=3.7
licenseALv2
keywords science data analysis archive catalog publication pipes
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ๐Ÿ”ฌ Pipe Dreams

### Do you want to:

- Organize your huge pile of loose scripts ?
- Create neat and reusable python pipelines to process your data or run jobs ?
- Have a graph (DAG) based parallelization without too much fuss ?  
Well, you are at the right place. Pipe Dreams is a super duper light application programmer interface (API) to support the construction and processing of data pipes for scientific data. It was built primarily for the [Laboratory Catalog and Archive System](https://github.com/EDRN/labcas-backend), but now open-ended for other systems.

### How do we do it:

- We use Python Dictionaries to encapsulate all your intermediate results/data flowing through the pipeline, so you can not only declare and run a sequence of functions but also wire the individual output variables to some specific input parameters. What's more, you can rename, merge and exercise other fine grain control over your intermediate results.
- We provide a Plugin class that can be subclassed to organize your python functions and then call these using their relative string paths in our framework.
- We use [Celery](https://pypi.org/project/celery/), [Redis](https://redis.io/), and [NetworkX](https://pypi.org/project/networkx/) to parallelize your workflows with minimal setup on the users part.  


## ๐Ÿš— Starting Redis

The Pipe Dreams API requires [Redis](https://redis.io/) to run. To start Redis (assuming [Docker](https://www.docker.com/) in installed), run:

```console
$ docker container run \
    --name labcas-redis \
    --publish 6379:6379 \
    --detach \
    redis:6.2.4-alpine
```

## ๐Ÿ’ฟ Installing Pipe Dreams

Pipe Dreams is an open source, installable Python packge. It requires [Python 3.7](https://www.python.org/) or later. Typically, you'd install it into [Python virtual environment](https://docs.python.org/3/tutorial/venv.html), but you can also put it into a [Conda](https://docs.conda.io/en/latest/) orโ€”if you mustโ€”your system's Python.

To use a virtual environment, run:

```console
$ python3 -m venv venv
$ venv/bin/pip install --upgrade setuptools pip wheel
$ venv/bin/pip install jpl.pipedreams
$ source venv/bin/activate  # or use activate.csh or activate.fish as needed
```

Once this is done, you can run `venv/bin/python` as your Python interpreter and it will have the Pipe Dreams API (and all its dependencies) ready for use. Note that the `activate` step, although deprecated, is still necessary in order to have the `celery` program on your execution path.

๐Ÿ‘‰ **Note:** As of release 1.0.3 of Pipe Dreams, Python 3.7 through Python 3.9 are supported. Python 3.10 is not yet endorsed by this package.


## ๐Ÿ‘ฉโ€๐Ÿ’ป Customizing the Workflow

The next step is to create a workflow to define the processing steps to publish the data. As an example, see the `demo/demo.py` which is [available from the GitHub release of this package](https://github.com/EDRN/jpl.pipedreams/releases/).

In summary you need to

1.  Create an `Operation` instance.
2.  Add pipes (a sequence of named functions) to the instance.
3.  Run the operation in either single or multi process(es).  


## ๐Ÿ“— Process Your Data Pipes

Finally, with Redis running and a custom workflow defined, you can then execute your pipeline.

As an example, we provide a demonstration workflow and associated test data. You can run it (assuming you've got the virtual Python environment from above) as follows:

```console
$ curl -LO https://github.com/EDRN/jpl.pipedreams/releases/download/v1.0.2/demo.tar.gz | tar xzf -
$ cd demo
$ ../venv/bin/pip install --requirement requirements.txt
$ ../venv/bin/python demo.py
Adding Node: hello_world_read|+|mydata0.txt
โ€ฆ
num nodes in task graph: 7
num task completed: 7
time taken: 0:00:00.NNNNN
```

That's it ๐Ÿฅณ

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/EDRN/jpl.pipedreams",
    "name": "jpl.pipedreams",
    "maintainer": "Sean Kelly",
    "docs_url": null,
    "requires_python": "<3.10,>=3.7",
    "maintainer_email": "",
    "keywords": "science,data,analysis,archive,catalog,publication,pipes",
    "author": "Asitang Mishra",
    "author_email": "Asitang.Mishra@jpl.nasa.gov",
    "download_url": "https://files.pythonhosted.org/packages/4e/5c/f471aadf513a2551da026dcf588454acfae558870d8c6a2f66878b01d0bc/jpl.pipedreams-1.0.5.tar.gz",
    "platform": null,
    "description": "# \ud83d\udd2c Pipe Dreams\n\n### Do you want to:\n\n- Organize your huge pile of loose scripts ?\n- Create neat and reusable python pipelines to process your data or run jobs ?\n- Have a graph (DAG) based parallelization without too much fuss ?  \nWell, you are at the right place. Pipe Dreams is a super duper light application programmer interface (API) to support the construction and processing of data pipes for scientific data. It was built primarily for the [Laboratory Catalog and Archive System](https://github.com/EDRN/labcas-backend), but now open-ended for other systems.\n\n### How do we do it:\n\n- We use Python Dictionaries to encapsulate all your intermediate results/data flowing through the pipeline, so you can not only declare and run a sequence of functions but also wire the individual output variables to some specific input parameters. What's more, you can rename, merge and exercise other fine grain control over your intermediate results.\n- We provide a Plugin class that can be subclassed to organize your python functions and then call these using their relative string paths in our framework.\n- We use [Celery](https://pypi.org/project/celery/), [Redis](https://redis.io/), and [NetworkX](https://pypi.org/project/networkx/) to parallelize your workflows with minimal setup on the users part.  \n\n\n## \ud83d\ude97 Starting Redis\n\nThe Pipe Dreams API requires [Redis](https://redis.io/) to run. To start Redis (assuming [Docker](https://www.docker.com/) in installed), run:\n\n```console\n$ docker container run \\\n    --name labcas-redis \\\n    --publish 6379:6379 \\\n    --detach \\\n    redis:6.2.4-alpine\n```\n\n## \ud83d\udcbf Installing Pipe Dreams\n\nPipe Dreams is an open source, installable Python packge. It requires [Python 3.7](https://www.python.org/) or later. Typically, you'd install it into [Python virtual environment](https://docs.python.org/3/tutorial/venv.html), but you can also put it into a [Conda](https://docs.conda.io/en/latest/) or\u2014if you must\u2014your system's Python.\n\nTo use a virtual environment, run:\n\n```console\n$ python3 -m venv venv\n$ venv/bin/pip install --upgrade setuptools pip wheel\n$ venv/bin/pip install jpl.pipedreams\n$ source venv/bin/activate  # or use activate.csh or activate.fish as needed\n```\n\nOnce this is done, you can run `venv/bin/python` as your Python interpreter and it will have the Pipe Dreams API (and all its dependencies) ready for use. Note that the `activate` step, although deprecated, is still necessary in order to have the `celery` program on your execution path.\n\n\ud83d\udc49 **Note:** As of release 1.0.3 of Pipe Dreams, Python 3.7 through Python 3.9 are supported. Python 3.10 is not yet endorsed by this package.\n\n\n## \ud83d\udc69\u200d\ud83d\udcbb Customizing the Workflow\n\nThe next step is to create a workflow to define the processing steps to publish the data. As an example, see the `demo/demo.py` which is [available from the GitHub release of this package](https://github.com/EDRN/jpl.pipedreams/releases/).\n\nIn summary you need to\n\n1.  Create an `Operation` instance.\n2.  Add pipes (a sequence of named functions) to the instance.\n3.  Run the operation in either single or multi process(es).  \n\n\n## \ud83d\udcd7 Process Your Data Pipes\n\nFinally, with Redis running and a custom workflow defined, you can then execute your pipeline.\n\nAs an example, we provide a demonstration workflow and associated test data. You can run it (assuming you've got the virtual Python environment from above) as follows:\n\n```console\n$ curl -LO https://github.com/EDRN/jpl.pipedreams/releases/download/v1.0.2/demo.tar.gz | tar xzf -\n$ cd demo\n$ ../venv/bin/pip install --requirement requirements.txt\n$ ../venv/bin/python demo.py\nAdding Node: hello_world_read|+|mydata0.txt\n\u2026\nnum nodes in task graph: 7\nnum task completed: 7\ntime taken: 0:00:00.NNNNN\n```\n\nThat's it \ud83e\udd73\n",
    "bugtrack_url": null,
    "license": "ALv2",
    "summary": "Pipe Dreams: API for publication of scientific data",
    "version": "1.0.5",
    "project_urls": {
        "Download": "https://github.com/EDRN/jpl.pipedreams/releases",
        "Homepage": "https://github.com/EDRN/jpl.pipedreams"
    },
    "split_keywords": [
        "science",
        "data",
        "analysis",
        "archive",
        "catalog",
        "publication",
        "pipes"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bd25e462b2c0078a2da474ec740a927b1b4195e2b92188a64f9a57bb59bbb54b",
                "md5": "ebd9bb1848f964582fecd245ef75a072",
                "sha256": "66bb4eed96102163a2fbf397d7d0367a004bb121ea1ab482da60b90ba6b4150d"
            },
            "downloads": -1,
            "filename": "jpl.pipedreams-1.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ebd9bb1848f964582fecd245ef75a072",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.10,>=3.7",
            "size": 15974,
            "upload_time": "2024-02-28T21:26:49",
            "upload_time_iso_8601": "2024-02-28T21:26:49.996218Z",
            "url": "https://files.pythonhosted.org/packages/bd/25/e462b2c0078a2da474ec740a927b1b4195e2b92188a64f9a57bb59bbb54b/jpl.pipedreams-1.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4e5cf471aadf513a2551da026dcf588454acfae558870d8c6a2f66878b01d0bc",
                "md5": "0d98cede3b92da8f4c9351cde8395bc6",
                "sha256": "c0a9d6a6f567adbc7c8ed6581aa88d447db45f6b6c2d3b69829809fbad36d910"
            },
            "downloads": -1,
            "filename": "jpl.pipedreams-1.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "0d98cede3b92da8f4c9351cde8395bc6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.10,>=3.7",
            "size": 17127,
            "upload_time": "2024-02-28T21:26:51",
            "upload_time_iso_8601": "2024-02-28T21:26:51.823774Z",
            "url": "https://files.pythonhosted.org/packages/4e/5c/f471aadf513a2551da026dcf588454acfae558870d8c6a2f66878b01d0bc/jpl.pipedreams-1.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-28 21:26:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "EDRN",
    "github_project": "jpl.pipedreams",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "jpl.pipedreams"
}
        
Elapsed time: 0.19906s