dpypeline


Namedpypeline JSON
Version 0.1.0b4 PyPI version JSON
download
home_page
SummaryProgram for creating data pipelines triggered by file creation events.
upload_time2023-09-04 15:19:04
maintainer
docs_urlNone
authorJoao Morado
requires_python>=3.9
license
keywords data pipeline data-pypeline dpypeline pypeline noc
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # dpypeline
![Continuous Integration](https://github.com/NOC-OI/object-store-project/actions/workflows/main.yml/badge.svg)
[![PyPI version](https://badge.fury.io/py/dpypeline.svg)](https://badge.fury.io/py/dpypeline)
![Test Coverage](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/JMorado/c20a3ec5262f14d970a462403316a547/raw/pytest_coverage_report_main.json)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

Program for creating data pipelines triggered by file creation events.

## Version

0.1.0-beta.4

## Python enviroment setup

To utilise this package, it should be installed within a dedicated Conda environment. You can create this environment using the following command:

```
conda create --name <environment_name> python=3.10
```

To activate the conda environment use:
```
conda activate <environment_name>
```

Alternatively, use `virtualenv` to setup and activate the environment:

```
python -m venv <environment_name>
source <envionment_name>/bin/activate
```

## Installation

1. Clone the repository:

```
git clone git@github.com:NOC-OI/dpyepline.git
```

2. Navigate to the package directory:

After cloning the repository, navigate to the root directory of the package.

3. Install in editable mode:

To install `dpypeline` in editable mode, execute the following comman from the root directory:

```
pip install -e .
```

This command will install the library in editable mode, allowing you to make changes to the code if needed.

4. Alternative installation methods:

- Install from the GitHub repository directly:


```
pip install git+https://github.com/NOC-OI/dpypeline.git@main#egg=dpypeline
```

- Install from the PyPI repository:

```
pip install dpypeline
```

## Unit tests

Run tests using `pytest` in the main directory:

```
pip install pytest
pytest
```
## Examples

### Python scripts

Examples of Python scripts explaining how to use this package can be found in the examples directory.

### Command line interface (CLI)

The CLI provided by this package allows you to execute data pipelines defined in YAML files; however, it offers less flexibility compared to using the Python scripts. To run the dpypeline CLI, type, e.g., the following command:

```bash
dpypeline -i <input_file> > output 2> errors
```

#### Flags description


- `-h` or `--help`: show an help message
- `-i INPUT_FILE` or `--input INPUT_FILE`: Filepath to the pipeline YAML file (by default `pipelien.yaml`)
- `-v` or `--version`: show dpypeline's version umber


### Environment variables

There are a few environment variables that need to be set so that the application can run correctly:

- `CACHE_DIR`: Path to the cache directory.

## Software Workflow Overview

## Pipeline architectures

![Dpypeline diagram](/images/dpypeline_diagram.png)


### Thread-based pipeline

In the thread-based pipeline, `Akita` enqueues events into an in-memory queue. These events are subsequently consumed by `ConsumerSerial`, which generates jobs for sequential execution within the `ThreadPipeline` (an alias for `BasicPipeline`).

### Parallel pipeline

In the parallel pipeline, `Akita` enqueues events into an in-memory queue. These events are then consumed by `ConsumerParallel`, which generates futures that are executed concurrently by multiple Dask workers.

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "dpypeline",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "Joao Morado <joao.morado@noc.ac.uk>",
    "keywords": "data,pipeline,data-pypeline,dpypeline,pypeline,noc",
    "author": "Joao Morado",
    "author_email": "joao.morado@noc.ac.uk",
    "download_url": "https://files.pythonhosted.org/packages/ae/5e/72ff21133aa2b42a1e324f408ad31f4de33408738c0e5edf2d9aad7096ea/dpypeline-0.1.0b4.tar.gz",
    "platform": null,
    "description": "# dpypeline\n![Continuous Integration](https://github.com/NOC-OI/object-store-project/actions/workflows/main.yml/badge.svg)\n[![PyPI version](https://badge.fury.io/py/dpypeline.svg)](https://badge.fury.io/py/dpypeline)\n![Test Coverage](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/JMorado/c20a3ec5262f14d970a462403316a547/raw/pytest_coverage_report_main.json)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nProgram for creating data pipelines triggered by file creation events.\n\n## Version\n\n0.1.0-beta.4\n\n## Python enviroment setup\n\nTo utilise this package, it should be installed within a dedicated Conda environment. You can create this environment using the following command:\n\n```\nconda create --name <environment_name> python=3.10\n```\n\nTo activate the conda environment use:\n```\nconda activate <environment_name>\n```\n\nAlternatively, use `virtualenv` to setup and activate the environment:\n\n```\npython -m venv <environment_name>\nsource <envionment_name>/bin/activate\n```\n\n## Installation\n\n1. Clone the repository:\n\n```\ngit clone git@github.com:NOC-OI/dpyepline.git\n```\n\n2. Navigate to the package directory:\n\nAfter cloning the repository, navigate to the root directory of the package.\n\n3. Install in editable mode:\n\nTo install `dpypeline` in editable mode, execute the following comman from the root directory:\n\n```\npip install -e .\n```\n\nThis command will install the library in editable mode, allowing you to make changes to the code if needed.\n\n4. Alternative installation methods:\n\n- Install from the GitHub repository directly:\n\n\n```\npip install git+https://github.com/NOC-OI/dpypeline.git@main#egg=dpypeline\n```\n\n- Install from the PyPI repository:\n\n```\npip install dpypeline\n```\n\n## Unit tests\n\nRun tests using `pytest` in the main directory:\n\n```\npip install pytest\npytest\n```\n## Examples\n\n### Python scripts\n\nExamples of Python scripts explaining how to use this package can be found in the examples directory.\n\n### Command line interface (CLI)\n\nThe CLI provided by this package allows you to execute data pipelines defined in YAML files; however, it offers less flexibility compared to using the Python scripts. To run the dpypeline CLI, type, e.g., the following command:\n\n```bash\ndpypeline -i <input_file> > output 2> errors\n```\n\n#### Flags description\n\n\n- `-h` or `--help`: show an help message\n- `-i INPUT_FILE` or `--input INPUT_FILE`: Filepath to the pipeline YAML file (by default `pipelien.yaml`)\n- `-v` or `--version`: show dpypeline's version umber\n\n\n### Environment variables\n\nThere are a few environment variables that need to be set so that the application can run correctly:\n\n- `CACHE_DIR`: Path to the cache directory.\n\n## Software Workflow Overview\n\n## Pipeline architectures\n\n![Dpypeline diagram](/images/dpypeline_diagram.png)\n\n\n### Thread-based pipeline\n\nIn the thread-based pipeline, `Akita` enqueues events into an in-memory queue. These events are subsequently consumed by `ConsumerSerial`, which generates jobs for sequential execution within the `ThreadPipeline` (an alias for `BasicPipeline`).\n\n### Parallel pipeline\n\nIn the parallel pipeline, `Akita` enqueues events into an in-memory queue. These events are then consumed by `ConsumerParallel`, which generates futures that are executed concurrently by multiple Dask workers.\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Program for creating data pipelines triggered by file creation events.",
    "version": "0.1.0b4",
    "project_urls": {
        "repository": "https://github.com/NOC-OI/data-pypeline"
    },
    "split_keywords": [
        "data",
        "pipeline",
        "data-pypeline",
        "dpypeline",
        "pypeline",
        "noc"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bbe1bde84680e9aa87067c56f34b78f9066c9032c68966f02152b3fee59889da",
                "md5": "584884652d3ccea58e60971e800d6e3b",
                "sha256": "fa7595fce24acf122d227c1418b052bb2ced095a967427715dfb0ac80ea7a94d"
            },
            "downloads": -1,
            "filename": "dpypeline-0.1.0b4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "584884652d3ccea58e60971e800d6e3b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 28156,
            "upload_time": "2023-09-04T15:19:03",
            "upload_time_iso_8601": "2023-09-04T15:19:03.148310Z",
            "url": "https://files.pythonhosted.org/packages/bb/e1/bde84680e9aa87067c56f34b78f9066c9032c68966f02152b3fee59889da/dpypeline-0.1.0b4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ae5e72ff21133aa2b42a1e324f408ad31f4de33408738c0e5edf2d9aad7096ea",
                "md5": "e7dae6d549050992b3f701d1c860ce90",
                "sha256": "037025458aebde1e9536bf4285ae5a3563b4543b0315b2be282308edee981cf3"
            },
            "downloads": -1,
            "filename": "dpypeline-0.1.0b4.tar.gz",
            "has_sig": false,
            "md5_digest": "e7dae6d549050992b3f701d1c860ce90",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 24518,
            "upload_time": "2023-09-04T15:19:04",
            "upload_time_iso_8601": "2023-09-04T15:19:04.446715Z",
            "url": "https://files.pythonhosted.org/packages/ae/5e/72ff21133aa2b42a1e324f408ad31f4de33408738c0e5edf2d9aad7096ea/dpypeline-0.1.0b4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-04 15:19:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "NOC-OI",
    "github_project": "data-pypeline",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "dpypeline"
}
        
Elapsed time: 0.11241s