# dpypeline
![Continuous Integration](https://github.com/NOC-OI/object-store-project/actions/workflows/main.yml/badge.svg)
[![PyPI version](https://badge.fury.io/py/dpypeline.svg)](https://badge.fury.io/py/dpypeline)
![Test Coverage](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/JMorado/c20a3ec5262f14d970a462403316a547/raw/pytest_coverage_report_main.json)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
Program for creating data pipelines triggered by file creation events.
## Version
0.1.0-beta.4
## Python enviroment setup
To utilise this package, it should be installed within a dedicated Conda environment. You can create this environment using the following command:
```
conda create --name <environment_name> python=3.10
```
To activate the conda environment use:
```
conda activate <environment_name>
```
Alternatively, use `virtualenv` to setup and activate the environment:
```
python -m venv <environment_name>
source <envionment_name>/bin/activate
```
## Installation
1. Clone the repository:
```
git clone git@github.com:NOC-OI/dpyepline.git
```
2. Navigate to the package directory:
After cloning the repository, navigate to the root directory of the package.
3. Install in editable mode:
To install `dpypeline` in editable mode, execute the following comman from the root directory:
```
pip install -e .
```
This command will install the library in editable mode, allowing you to make changes to the code if needed.
4. Alternative installation methods:
- Install from the GitHub repository directly:
```
pip install git+https://github.com/NOC-OI/dpypeline.git@main#egg=dpypeline
```
- Install from the PyPI repository:
```
pip install dpypeline
```
## Unit tests
Run tests using `pytest` in the main directory:
```
pip install pytest
pytest
```
## Examples
### Python scripts
Examples of Python scripts explaining how to use this package can be found in the examples directory.
### Command line interface (CLI)
The CLI provided by this package allows you to execute data pipelines defined in YAML files; however, it offers less flexibility compared to using the Python scripts. To run the dpypeline CLI, type, e.g., the following command:
```bash
dpypeline -i <input_file> > output 2> errors
```
#### Flags description
- `-h` or `--help`: show an help message
- `-i INPUT_FILE` or `--input INPUT_FILE`: Filepath to the pipeline YAML file (by default `pipelien.yaml`)
- `-v` or `--version`: show dpypeline's version umber
### Environment variables
There are a few environment variables that need to be set so that the application can run correctly:
- `CACHE_DIR`: Path to the cache directory.
## Software Workflow Overview
## Pipeline architectures
![Dpypeline diagram](/images/dpypeline_diagram.png)
### Thread-based pipeline
In the thread-based pipeline, `Akita` enqueues events into an in-memory queue. These events are subsequently consumed by `ConsumerSerial`, which generates jobs for sequential execution within the `ThreadPipeline` (an alias for `BasicPipeline`).
### Parallel pipeline
In the parallel pipeline, `Akita` enqueues events into an in-memory queue. These events are then consumed by `ConsumerParallel`, which generates futures that are executed concurrently by multiple Dask workers.
Raw data
{
"_id": null,
"home_page": "",
"name": "dpypeline",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "Joao Morado <joao.morado@noc.ac.uk>",
"keywords": "data,pipeline,data-pypeline,dpypeline,pypeline,noc",
"author": "Joao Morado",
"author_email": "joao.morado@noc.ac.uk",
"download_url": "https://files.pythonhosted.org/packages/ae/5e/72ff21133aa2b42a1e324f408ad31f4de33408738c0e5edf2d9aad7096ea/dpypeline-0.1.0b4.tar.gz",
"platform": null,
"description": "# dpypeline\n![Continuous Integration](https://github.com/NOC-OI/object-store-project/actions/workflows/main.yml/badge.svg)\n[![PyPI version](https://badge.fury.io/py/dpypeline.svg)](https://badge.fury.io/py/dpypeline)\n![Test Coverage](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/JMorado/c20a3ec5262f14d970a462403316a547/raw/pytest_coverage_report_main.json)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nProgram for creating data pipelines triggered by file creation events.\n\n## Version\n\n0.1.0-beta.4\n\n## Python enviroment setup\n\nTo utilise this package, it should be installed within a dedicated Conda environment. You can create this environment using the following command:\n\n```\nconda create --name <environment_name> python=3.10\n```\n\nTo activate the conda environment use:\n```\nconda activate <environment_name>\n```\n\nAlternatively, use `virtualenv` to setup and activate the environment:\n\n```\npython -m venv <environment_name>\nsource <envionment_name>/bin/activate\n```\n\n## Installation\n\n1. Clone the repository:\n\n```\ngit clone git@github.com:NOC-OI/dpyepline.git\n```\n\n2. Navigate to the package directory:\n\nAfter cloning the repository, navigate to the root directory of the package.\n\n3. Install in editable mode:\n\nTo install `dpypeline` in editable mode, execute the following comman from the root directory:\n\n```\npip install -e .\n```\n\nThis command will install the library in editable mode, allowing you to make changes to the code if needed.\n\n4. Alternative installation methods:\n\n- Install from the GitHub repository directly:\n\n\n```\npip install git+https://github.com/NOC-OI/dpypeline.git@main#egg=dpypeline\n```\n\n- Install from the PyPI repository:\n\n```\npip install dpypeline\n```\n\n## Unit tests\n\nRun tests using `pytest` in the main directory:\n\n```\npip install pytest\npytest\n```\n## Examples\n\n### Python scripts\n\nExamples of Python scripts explaining how to use this package can be found in the examples directory.\n\n### Command line interface (CLI)\n\nThe CLI provided by this package allows you to execute data pipelines defined in YAML files; however, it offers less flexibility compared to using the Python scripts. To run the dpypeline CLI, type, e.g., the following command:\n\n```bash\ndpypeline -i <input_file> > output 2> errors\n```\n\n#### Flags description\n\n\n- `-h` or `--help`: show an help message\n- `-i INPUT_FILE` or `--input INPUT_FILE`: Filepath to the pipeline YAML file (by default `pipelien.yaml`)\n- `-v` or `--version`: show dpypeline's version umber\n\n\n### Environment variables\n\nThere are a few environment variables that need to be set so that the application can run correctly:\n\n- `CACHE_DIR`: Path to the cache directory.\n\n## Software Workflow Overview\n\n## Pipeline architectures\n\n![Dpypeline diagram](/images/dpypeline_diagram.png)\n\n\n### Thread-based pipeline\n\nIn the thread-based pipeline, `Akita` enqueues events into an in-memory queue. These events are subsequently consumed by `ConsumerSerial`, which generates jobs for sequential execution within the `ThreadPipeline` (an alias for `BasicPipeline`).\n\n### Parallel pipeline\n\nIn the parallel pipeline, `Akita` enqueues events into an in-memory queue. These events are then consumed by `ConsumerParallel`, which generates futures that are executed concurrently by multiple Dask workers.\n",
"bugtrack_url": null,
"license": "",
"summary": "Program for creating data pipelines triggered by file creation events.",
"version": "0.1.0b4",
"project_urls": {
"repository": "https://github.com/NOC-OI/data-pypeline"
},
"split_keywords": [
"data",
"pipeline",
"data-pypeline",
"dpypeline",
"pypeline",
"noc"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "bbe1bde84680e9aa87067c56f34b78f9066c9032c68966f02152b3fee59889da",
"md5": "584884652d3ccea58e60971e800d6e3b",
"sha256": "fa7595fce24acf122d227c1418b052bb2ced095a967427715dfb0ac80ea7a94d"
},
"downloads": -1,
"filename": "dpypeline-0.1.0b4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "584884652d3ccea58e60971e800d6e3b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 28156,
"upload_time": "2023-09-04T15:19:03",
"upload_time_iso_8601": "2023-09-04T15:19:03.148310Z",
"url": "https://files.pythonhosted.org/packages/bb/e1/bde84680e9aa87067c56f34b78f9066c9032c68966f02152b3fee59889da/dpypeline-0.1.0b4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ae5e72ff21133aa2b42a1e324f408ad31f4de33408738c0e5edf2d9aad7096ea",
"md5": "e7dae6d549050992b3f701d1c860ce90",
"sha256": "037025458aebde1e9536bf4285ae5a3563b4543b0315b2be282308edee981cf3"
},
"downloads": -1,
"filename": "dpypeline-0.1.0b4.tar.gz",
"has_sig": false,
"md5_digest": "e7dae6d549050992b3f701d1c860ce90",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 24518,
"upload_time": "2023-09-04T15:19:04",
"upload_time_iso_8601": "2023-09-04T15:19:04.446715Z",
"url": "https://files.pythonhosted.org/packages/ae/5e/72ff21133aa2b42a1e324f408ad31f4de33408738c0e5edf2d9aad7096ea/dpypeline-0.1.0b4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-04 15:19:04",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "NOC-OI",
"github_project": "data-pypeline",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "dpypeline"
}