# hera_opm
[![Run Tests](https://github.com/HERA-Team/hera_opm/workflows/Run%20Tests/badge.svg)](https://github.com/HERA-Team/hera_opm/actions)
[![Code Coverage](https://codecov.io/gh/HERA-Team/hera_opm/branch/main/graph/badge.svg?token=cFmFFBVHZP)](https://codecov.io/gh/HERA-Team/hera_opm)
[![License](https://img.shields.io/badge/License-BSD%202--Clause-orange.svg)](https://opensource.org/licenses/BSD-2-Clause)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black)
`hera_opm` provides a convenient and flexible framework for developing data
analysis pipelines for operating on HERA data. It facilitates "offline
processing", and is portable enough to operate on computer clusters with
batch submission systems or on local machines.
# How It Works
The `hera_opm` package uses the `makeflow` system, which is a part of the
[Cooperative Computing Tools
package](https://github.com/cooperative-computing-lab/cctools) developed by the
[Cooperative Computing Lab](http://ccl.cse.nd.edu). The `hera_opm` package
essentially converts a pipeline defined in a configuration file into a format
that can be parsed by `makeflow`. This process is also aware of aspects specific
to HERA data, such as the polarization features of the data, in order to build
an appropriate software pipeline. Once the `makeflow` instructions file has been
generated, the `makeflow` program itself is used to execute the steps in the
pipeline.
There are generally 5 steps required to "build a pipeline":
1. Write *task scripts* that will be executed by `makeflow` for a given stage in
the pipeline. These scripts should generally be as atomic as possible, and
perform only a single logical component of a pipeline (though it may in turn
call several supporting scripts or commands).
2. Write a *configuration file* which defines the order of tasks to be
completed. This configuration file defines the logical flow of the pipeline, as
well as prerequisites for each task. It also allows for defining compute and
memory requirements, for systems that support resource management.
3. Use the provided `build_makeflow_from_config.py` script to build a `makeflow`
instruction file that specifies the pipeline tasks applied to the data files.
4. Use the provided `makeflow_nrao.sh` or `makeflow_local.sh` to execute the
pipeline in either the NRAO batch scheduler environment, or on a local machine,
respectively.
5. (Optional) Use the provided `clean_up_makeflow.py` to clean up the work
directory for makeflow. This will remove the wrapper scripts and output files,
and generate a single log file for all jobs in the makeflow.
# Installation
To install the `hera_opm` package, simply:
```
pip install .
```
As mentioned above, `hera_opm` uses `makeflow` as the backing pipeline management
software. As such, `makeflow` must be installed. To install `makeflow` in your
home directory:
```bash
git clone https://github.com/cooperative-computing-lab/cctools.git
cd cctools
./configure --prefix=${HOME}/cctools
make clean
make install
export PATH=${PATH}:${HOME}/cctools/bin
```
For convenience, it is helpful to add the `export` statement to your `.bashrc`
file, so that the `makeflow` commands are always on your `PATH`.
## Dependencies
When installing the package, setuptools will attempt to download and install any
missing dependencies. If you prefer to manage your own python environment
(through conda or pip or some other manager), you can install them yourself.
### Required
* toml >= 0.9.4
### Optional
* [hera_cal](https://github.com/HERA-Team/hera_cal)
Generating an `lstbin` pipeline (instead of `analysis`) requires that hera_cal
be installed. The main package and tests can be run without this requirement.
# Task Scripts and Config Files
For documentation on building task scripts, see [the task scipts docs
page](docs/task_scripts.md). For documentation on config files, see [the config
file docs page](docs/config_files.md).
# Testing
`hera_opm` uses `pytest` as its testing framework. To run the test suite, do:
```
pytest
```
from the root repo directory. This may require running `pip install .[test]` to
install testing dependencies.
Raw data
{
"_id": null,
"home_page": "https://github.com/HERA-Team/hera_opm",
"name": "hera-opm",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "HERA Team",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/28/3c/d27b1be406fbc235717dc5121edde5a0ba7d6e273518ce46b01e9bfdb7ca/hera_opm-1.4.0.tar.gz",
"platform": null,
"description": "# hera_opm\n\n[![Run Tests](https://github.com/HERA-Team/hera_opm/workflows/Run%20Tests/badge.svg)](https://github.com/HERA-Team/hera_opm/actions)\n[![Code Coverage](https://codecov.io/gh/HERA-Team/hera_opm/branch/main/graph/badge.svg?token=cFmFFBVHZP)](https://codecov.io/gh/HERA-Team/hera_opm)\n[![License](https://img.shields.io/badge/License-BSD%202--Clause-orange.svg)](https://opensource.org/licenses/BSD-2-Clause)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black)\n\n`hera_opm` provides a convenient and flexible framework for developing data\nanalysis pipelines for operating on HERA data. It facilitates \"offline\nprocessing\", and is portable enough to operate on computer clusters with\nbatch submission systems or on local machines.\n\n# How It Works\n\nThe `hera_opm` package uses the `makeflow` system, which is a part of the\n[Cooperative Computing Tools\npackage](https://github.com/cooperative-computing-lab/cctools) developed by the\n[Cooperative Computing Lab](http://ccl.cse.nd.edu). The `hera_opm` package\nessentially converts a pipeline defined in a configuration file into a format\nthat can be parsed by `makeflow`. This process is also aware of aspects specific\nto HERA data, such as the polarization features of the data, in order to build\nan appropriate software pipeline. Once the `makeflow` instructions file has been\ngenerated, the `makeflow` program itself is used to execute the steps in the\npipeline.\n\nThere are generally 5 steps required to \"build a pipeline\":\n\n1. Write *task scripts* that will be executed by `makeflow` for a given stage in\nthe pipeline. These scripts should generally be as atomic as possible, and\nperform only a single logical component of a pipeline (though it may in turn\ncall several supporting scripts or commands).\n2. Write a *configuration file* which defines the order of tasks to be\ncompleted. This configuration file defines the logical flow of the pipeline, as\nwell as prerequisites for each task. It also allows for defining compute and\nmemory requirements, for systems that support resource management.\n3. Use the provided `build_makeflow_from_config.py` script to build a `makeflow`\ninstruction file that specifies the pipeline tasks applied to the data files.\n4. Use the provided `makeflow_nrao.sh` or `makeflow_local.sh` to execute the\npipeline in either the NRAO batch scheduler environment, or on a local machine,\nrespectively.\n5. (Optional) Use the provided `clean_up_makeflow.py` to clean up the work\ndirectory for makeflow. This will remove the wrapper scripts and output files,\nand generate a single log file for all jobs in the makeflow.\n\n# Installation\n\nTo install the `hera_opm` package, simply:\n```\npip install .\n```\n\nAs mentioned above, `hera_opm` uses `makeflow` as the backing pipeline management\nsoftware. As such, `makeflow` must be installed. To install `makeflow` in your\nhome directory:\n```bash\ngit clone https://github.com/cooperative-computing-lab/cctools.git\ncd cctools\n./configure --prefix=${HOME}/cctools\nmake clean\nmake install\nexport PATH=${PATH}:${HOME}/cctools/bin\n```\nFor convenience, it is helpful to add the `export` statement to your `.bashrc`\nfile, so that the `makeflow` commands are always on your `PATH`.\n\n## Dependencies\n\nWhen installing the package, setuptools will attempt to download and install any\nmissing dependencies. If you prefer to manage your own python environment\n(through conda or pip or some other manager), you can install them yourself.\n\n### Required\n\n* toml >= 0.9.4\n\n### Optional\n\n* [hera_cal](https://github.com/HERA-Team/hera_cal)\n\nGenerating an `lstbin` pipeline (instead of `analysis`) requires that hera_cal\nbe installed. The main package and tests can be run without this requirement.\n\n# Task Scripts and Config Files\n\nFor documentation on building task scripts, see [the task scipts docs\npage](docs/task_scripts.md). For documentation on config files, see [the config\nfile docs page](docs/config_files.md).\n\n\n# Testing\n\n`hera_opm` uses `pytest` as its testing framework. To run the test suite, do:\n```\npytest\n```\nfrom the root repo directory. This may require running `pip install .[test]` to\ninstall testing dependencies.\n",
"bugtrack_url": null,
"license": "BSD",
"summary": "offline-processing and pipeline managment for HERA data analysis",
"version": "1.4.0",
"project_urls": {
"Homepage": "https://github.com/HERA-Team/hera_opm"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "44624116965a104514aad4b3e7638fddcb5410bf76d7b25a48ff4195aa1c5d53",
"md5": "35082b2cf9f27a5cf1db3c313953aada",
"sha256": "7afb159b5794b8244e059b894cbb1d22a137fddf064c6ddc30e76593f65327ac"
},
"downloads": -1,
"filename": "hera_opm-1.4.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "35082b2cf9f27a5cf1db3c313953aada",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 7297281,
"upload_time": "2024-08-01T15:42:03",
"upload_time_iso_8601": "2024-08-01T15:42:03.197115Z",
"url": "https://files.pythonhosted.org/packages/44/62/4116965a104514aad4b3e7638fddcb5410bf76d7b25a48ff4195aa1c5d53/hera_opm-1.4.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "283cd27b1be406fbc235717dc5121edde5a0ba7d6e273518ce46b01e9bfdb7ca",
"md5": "e3b4bc2313c8ff46f8db52240edaf4fb",
"sha256": "797e638131431f6109b6ba05edd73c0f173fc6d51eba7c747c94593544c43d66"
},
"downloads": -1,
"filename": "hera_opm-1.4.0.tar.gz",
"has_sig": false,
"md5_digest": "e3b4bc2313c8ff46f8db52240edaf4fb",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 7279177,
"upload_time": "2024-08-01T15:42:05",
"upload_time_iso_8601": "2024-08-01T15:42:05.033288Z",
"url": "https://files.pythonhosted.org/packages/28/3c/d27b1be406fbc235717dc5121edde5a0ba7d6e273518ce46b01e9bfdb7ca/hera_opm-1.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-01 15:42:05",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "HERA-Team",
"github_project": "hera_opm",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"lcname": "hera-opm"
}