# Hybrid Multi-Cloud Analytics Services Framework
**Cloudmesh Controlled Computing through Workflows**
Gregor von Laszewski (laszewski@gmail.com)$^*$,
Jacques Fleischer
$^*$ Corresponding author
## Citation
* <https://arxiv.org/pdf/2210.16941>
* <https://github.com/cyberaide/paper-cloudmesh-cc/raw/main/vonLaszewski-cloudmesh-cc.pdf>
```
@misc{las-2022-hybrid-cc,
title = {Hybrid Reusable Computational Analytics Workflow
Management with Cloudmesh},
author = {Gregor von Laszewski and J. P. Fleischer and
Geoffrey C. Fox},
year = 2022,
eprint = {2210.16941},
archivePrefix ={arXiv},
primaryClass = {cs.DC},
url = {https://arxiv.org/pdf/2210.16941},
urlOPT =
{https://github.com/cyberaide/paper-cloudmesh-cc/raw/main/vonLaszewski-cloudmesh-cc.pdf}
}
```
## Background
High-performance computing (HPC) is for decades a very important tool
for science. Scientific tasks can be leveraging the processing power
of a supercomputer so they can run at previously unobtainable high
speeds or utilize specialized hardware for acceleration that otherwise
are not available to the user. HPC can be used for analytic programs
that leverage machine learning applied to large data sets to, for
example, predict future values or to model current states. For such
high-complexity projects, there are often multiple complex programs
that may be running repeatedly in either competition or
cooperation. Leveraging for example computational GPUs leads to
several times higher performance when applied to deep learning
algorithms. With such projects, program execution is submitted as a
job to a typically remote HPC center, where time is billed as node
hours. Such projects must have a service that lets the user manage and
execute without supervision. We have created a service that lets the
user run jobs across multiple platforms in a dynamic queue with
visualization and data storage.
See @fig:fastapi-service.
![OpenAPI Description of the REST Interface to the Workflow](images/fastapi-service.png){#fig:fastapi-service width=50%}
## Workflow Controlled Computing
This software was developed end enhancing Cloudmesh, a suite of
software to make using cloud and HPC resources easier. Specifically,
we have added a library called Cloudmesh Controlled Computing
(cloudmesh-cc) that adds workflow features to control the execution of
jobs on remote compute resources.
The goal is to provide numerous methods of specifying the workflows on
a local computer and running them on remote services such as HPC and
cloud computing resources. This includes REST services and command
line tools. The software developed is freely available and can easily
be installed with standard Python tools so integration in the Python
ecosystem using virtualenv's and Anaconda is simple.
## Workflow Functionality
A hybrid multi-cloud analytics service framework was created to manage
heterogeneous and remote workflows, queues, and jobs. It was designed
for access through both the command line and REST services
to simplify the coordination of tasks on remote computers. In
addition, this service supports multiple operating systems like macOS,
Linux, and Windows 10 and 11, on various hosts: the computer's
localhost, remote computers, and the Linux-based virtual image WSL.
Jobs can be visualized and saved as a YAML and SVG data file. This
workflow was extensively tested for functionality and reproducibility.
## Quickstart
To test the workflow program, prepare a cm directory in your home
directory by executing the following commands in a terminal:
```bash
mkdir ~/cm
cd ~/cm
pip install cloudmesh-installer -U
cloudmesh-installer get cc
cd cloudmesh-cc
pytest -v -x --capture=no tests/test_199_workflow_clean.py
```
This test runs three jobs within a singular workflow: the first job
runs a local shell script, the second runs a local Python script, and
the third runs a local Jupyter notebook.
## Application demonstration using MNIST
The Modified National Institute of Standards and Technology Database
is a machine learning database based on image processing Various MNIST
files involving different machine learning cases were modified and
tested on various local and remote machines These cases include
Multilayer Perceptron, LSTM (Long short-term memory), Auto-Encoder,
Convolutional, and Recurrent Neural Networks, Distributed Training,
and PyTorch training.
See @fig:workflow-uml.
![Design for the workflow.](images/workflow-uml.png){#fig:workflow-uml}
## Design
The hybrid multi-cloud analytics service framework was created to
ensure running jobs across many platforms. We designed a small and
streamlined number of abstractions so that jobs and workflows can be
represented easily. The design is flexible and can be expanded as each
job can contain arbitrary arguments. This made it possible to custom
design for each target type a specific job type so that execution on
local and remote compute resources including batch operating systems
can be achieved. The job types supported include: local job on Linux,
macOS, Windows 10, and Windows 11, jobs running in WSL on Windows
computers, remote jobs using ssh, and batch jobs using Slurm.
In addition, we leveraged the existing Networkx Graph framework to
allow dependencies between jobs. This greatly reduced the complexity
of the implementation while being able to leverage graphical displays
of the workflow, as well as using scheduling jobs with for example
topological sort available in Networkx. Custom schedulers can be
designed easily based on the dependencies and job types managed
through this straightforward interface. The status of the jobs is
stored in a database that can be monitored during program
execution. The creation of the jobs is done on the fly, e.g. when the
job is needed to be determined on the dependencies when all its
parents are resolved. This is especially important as it allows
dynamic workflow patterns to be implemented while results from
previous calculations can be used in later stages of the workflow.
We have developed a simple-to-use API for this so programs can be
formulated using the API in Python. However, we embedded this API also
in a prototype REST service to showcase that integration into
language-independent frameworks is possible. The obvious functions to
manage workflows are supported including graph specification through
configuration files, upload of workflows, export, adding jobs and
dependencies, and visualizing the workflow during the execution. An
important feature that we added is the monitoring of the jobs while
using progress reports through automated log file mining. This way
each job reports the progress during the execution. This is especially
of importance when we run very complex and long-running jobs.
The REST service was implemented in FastAPI to leverage a small but
fast service that features a much smaller footprint for implementation
and setup in contrast to other similar REST service frameworks using
python.
This architectural component building this framework is depicted
@fig:workflow-uml. The code is available in this repository and
manual pages are provided on how to install it:
[cloudmesh-cc](https://github.com/cloudmesh/cloudmesh-cc).
## Summary
The main interaction with the workflow is through the command line.
With the framework, researchers and scientists should be able to
create jobs on their own, place them in the workflow, and run them on
various types of computers.
In addition, developers and users can utilize the built-in OpenAPI
graphical user interface to manage
workflows between jobs. They can be uploaded as YAML files or individually
added through the build-in debug framework.
Improvements to this project will include code cleanup and manual development.
## References
A poster based on a pre-alpha version of this code is available as ppt
and PDF file. However, that version is no longer valid and is
superseded by much improved efforts. The code summarized in the
pre-alpha version was mainly used to teach a number of students Python
and how to work in a team
* [Poster Presentation (PPTX)](https://github.com/cloudmesh/cloudmesh-cc/raw/main/documents/analytics-service.pptx)
* [Poster Presentation (PDF)](https://github.com/cloudmesh/cloudmesh-cc/raw/main/documents/analytics-service.pdf)
Please note also that the poster contains inaccurate statements and
descriptions and should not be used as a reference to this work.
## Acknowledgments
Continued work was in part funded by the NSF CyberTraining: CIC:
CyberTraining for Students and Technologies from Generation Z with the
award numbers 1829704 and 2200409.
We like to thank the following contributors for their help and evaluation in a
pre-alpha version of the code: Jackson Miskill, Alex Beck, Alison Lu.
We are excited that this effort contributed significantly to their
increased understanding of Python and how to develop in a team using
the Python ecosystem.
Raw data
{
"_id": null,
"home_page": "",
"name": "cloudmesh-cc",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "Gregor von Laszewski <laszewski@gmail.com>",
"keywords": "helper library,cloudmesh",
"author": "",
"author_email": "Gregor von Laszewski <laszewski@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/f9/61/1e8b1e5a7e79486cdcceca0e22597bd6f1b382f1edb84098a415fe469d0b/cloudmesh-cc-5.0.3.tar.gz",
"platform": null,
"description": "# Hybrid Multi-Cloud Analytics Services Framework\n\n**Cloudmesh Controlled Computing through Workflows**\n\nGregor von Laszewski (laszewski@gmail.com)$^*$,\nJacques Fleischer\n\n$^*$ Corresponding author\n\n## Citation\n\n* <https://arxiv.org/pdf/2210.16941>\n* <https://github.com/cyberaide/paper-cloudmesh-cc/raw/main/vonLaszewski-cloudmesh-cc.pdf>\n\n```\n@misc{las-2022-hybrid-cc,\n title =\t {Hybrid Reusable Computational Analytics Workflow\n Management with Cloudmesh},\n author =\t {Gregor von Laszewski and J. P. Fleischer and\n Geoffrey C. Fox},\n year =\t 2022,\n eprint =\t {2210.16941},\n archivePrefix ={arXiv},\n primaryClass = {cs.DC},\n url =\t\t {https://arxiv.org/pdf/2210.16941},\n urlOPT =\n {https://github.com/cyberaide/paper-cloudmesh-cc/raw/main/vonLaszewski-cloudmesh-cc.pdf}\n}\n```\n\n\n## Background\n\nHigh-performance computing (HPC) is for decades a very important tool\nfor science. Scientific tasks can be leveraging the processing power\nof a supercomputer so they can run at previously unobtainable high\nspeeds or utilize specialized hardware for acceleration that otherwise\nare not available to the user. HPC can be used for analytic programs\nthat leverage machine learning applied to large data sets to, for\nexample, predict future values or to model current states. For such\nhigh-complexity projects, there are often multiple complex programs\nthat may be running repeatedly in either competition or\ncooperation. Leveraging for example computational GPUs leads to\nseveral times higher performance when applied to deep learning\nalgorithms. With such projects, program execution is submitted as a\njob to a typically remote HPC center, where time is billed as node\nhours. Such projects must have a service that lets the user manage and\nexecute without supervision. We have created a service that lets the\nuser run jobs across multiple platforms in a dynamic queue with\nvisualization and data storage.\n\nSee @fig:fastapi-service.\n\n![OpenAPI Description of the REST Interface to the Workflow](images/fastapi-service.png){#fig:fastapi-service width=50%}\n\n\n## Workflow Controlled Computing\n\nThis software was developed end enhancing Cloudmesh, a suite of\nsoftware to make using cloud and HPC resources easier. Specifically,\nwe have added a library called Cloudmesh Controlled Computing\n(cloudmesh-cc) that adds workflow features to control the execution of\njobs on remote compute resources.\n\nThe goal is to provide numerous methods of specifying the workflows on\na local computer and running them on remote services such as HPC and\ncloud computing resources. This includes REST services and command\nline tools. The software developed is freely available and can easily\nbe installed with standard Python tools so integration in the Python\necosystem using virtualenv's and Anaconda is simple.\n\n\n## Workflow Functionality\n\nA hybrid multi-cloud analytics service framework was created to manage\nheterogeneous and remote workflows, queues, and jobs. It was designed\nfor access through both the command line and REST services\nto simplify the coordination of tasks on remote computers. In\naddition, this service supports multiple operating systems like macOS,\nLinux, and Windows 10 and 11, on various hosts: the computer's\nlocalhost, remote computers, and the Linux-based virtual image WSL.\nJobs can be visualized and saved as a YAML and SVG data file. This\nworkflow was extensively tested for functionality and reproducibility.\n\n## Quickstart\n\nTo test the workflow program, prepare a cm directory in your home\ndirectory by executing the following commands in a terminal:\n\n```bash\nmkdir ~/cm\ncd ~/cm\npip install cloudmesh-installer -U\ncloudmesh-installer get cc\ncd cloudmesh-cc\npytest -v -x --capture=no tests/test_199_workflow_clean.py\n```\n\nThis test runs three jobs within a singular workflow: the first job\nruns a local shell script, the second runs a local Python script, and\nthe third runs a local Jupyter notebook.\n\n## Application demonstration using MNIST\n\nThe Modified National Institute of Standards and Technology Database\nis a machine learning database based on image processing Various MNIST\nfiles involving different machine learning cases were modified and\ntested on various local and remote machines These cases include\nMultilayer Perceptron, LSTM (Long short-term memory), Auto-Encoder,\nConvolutional, and Recurrent Neural Networks, Distributed Training,\nand PyTorch training.\n\nSee @fig:workflow-uml.\n\n![Design for the workflow.](images/workflow-uml.png){#fig:workflow-uml}\n\n## Design\n\nThe hybrid multi-cloud analytics service framework was created to\nensure running jobs across many platforms. We designed a small and\nstreamlined number of abstractions so that jobs and workflows can be\nrepresented easily. The design is flexible and can be expanded as each\njob can contain arbitrary arguments. This made it possible to custom\ndesign for each target type a specific job type so that execution on\nlocal and remote compute resources including batch operating systems\ncan be achieved. The job types supported include: local job on Linux,\nmacOS, Windows 10, and Windows 11, jobs running in WSL on Windows\ncomputers, remote jobs using ssh, and batch jobs using Slurm.\n\n\n\nIn addition, we leveraged the existing Networkx Graph framework to\nallow dependencies between jobs. This greatly reduced the complexity\nof the implementation while being able to leverage graphical displays\nof the workflow, as well as using scheduling jobs with for example\ntopological sort available in Networkx. Custom schedulers can be\ndesigned easily based on the dependencies and job types managed\nthrough this straightforward interface. The status of the jobs is\nstored in a database that can be monitored during program\nexecution. The creation of the jobs is done on the fly, e.g. when the\njob is needed to be determined on the dependencies when all its\nparents are resolved. This is especially important as it allows\ndynamic workflow patterns to be implemented while results from\nprevious calculations can be used in later stages of the workflow.\n\nWe have developed a simple-to-use API for this so programs can be\nformulated using the API in Python. However, we embedded this API also\nin a prototype REST service to showcase that integration into\nlanguage-independent frameworks is possible. The obvious functions to\nmanage workflows are supported including graph specification through\nconfiguration files, upload of workflows, export, adding jobs and\ndependencies, and visualizing the workflow during the execution. An\nimportant feature that we added is the monitoring of the jobs while\nusing progress reports through automated log file mining. This way\neach job reports the progress during the execution. This is especially\nof importance when we run very complex and long-running jobs.\n\n\nThe REST service was implemented in FastAPI to leverage a small but\nfast service that features a much smaller footprint for implementation\nand setup in contrast to other similar REST service frameworks using\npython.\n\nThis architectural component building this framework is depicted\n@fig:workflow-uml. The code is available in this repository and\nmanual pages are provided on how to install it:\n[cloudmesh-cc](https://github.com/cloudmesh/cloudmesh-cc).\n\n## Summary\n\nThe main interaction with the workflow is through the command line.\nWith the framework, researchers and scientists should be able to\ncreate jobs on their own, place them in the workflow, and run them on\nvarious types of computers.\n\nIn addition, developers and users can utilize the built-in OpenAPI \ngraphical user interface to manage\nworkflows between jobs. They can be uploaded as YAML files or individually \nadded through the build-in debug framework.\n\nImprovements to this project will include code cleanup and manual development.\n\n## References\n\nA poster based on a pre-alpha version of this code is available as ppt\nand PDF file. However, that version is no longer valid and is\nsuperseded by much improved efforts. The code summarized in the\npre-alpha version was mainly used to teach a number of students Python\nand how to work in a team\n\n* [Poster Presentation (PPTX)](https://github.com/cloudmesh/cloudmesh-cc/raw/main/documents/analytics-service.pptx)\n* [Poster Presentation (PDF)](https://github.com/cloudmesh/cloudmesh-cc/raw/main/documents/analytics-service.pdf)\n\nPlease note also that the poster contains inaccurate statements and\ndescriptions and should not be used as a reference to this work.\n\n## Acknowledgments\n\nContinued work was in part funded by the NSF CyberTraining: CIC:\nCyberTraining for Students and Technologies from Generation Z with the\naward numbers 1829704 and 2200409.\nWe like to thank the following contributors for their help and evaluation in a \npre-alpha version of the code: Jackson Miskill, Alex Beck, Alison Lu.\nWe are excited that this effort contributed significantly to their\nincreased understanding of Python and how to develop in a team using\nthe Python ecosystem.\n\n\n",
"bugtrack_url": null,
"license": "Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ Copyright 2017 Gregor von Laszewski, Indiana University Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ",
"summary": "The cloudmesh compute coordinator",
"version": "5.0.3",
"project_urls": {
"Changelog": "https://github.com/cloudmesh/cloudmesh-cc/blob/main/CHANGELOG.md",
"Documentation": "https://github.com/cloudmesh/cloudmesh-cc/blob/main/README.md",
"Homepage": "https://github.com/cloudmesh/cloudmesh-cc",
"Issues": "https://github.com/cloudmesh/cloudmesh-cc/issues",
"Repository": "https://github.com/cloudmesh/cloudmesh-cc.git"
},
"split_keywords": [
"helper library",
"cloudmesh"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9a33801653083cead159ab9a1150bfd1352b3361684fd109c3b79100688c81da",
"md5": "1b0ac45c34cc111bdf6992a11176d964",
"sha256": "efb8fc121a31bdd46f52af9c84103a23fe9cb9d62eaf0975c073ade49a7031d6"
},
"downloads": -1,
"filename": "cloudmesh_cc-5.0.3-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "1b0ac45c34cc111bdf6992a11176d964",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.8",
"size": 68858,
"upload_time": "2023-12-23T05:39:57",
"upload_time_iso_8601": "2023-12-23T05:39:57.966725Z",
"url": "https://files.pythonhosted.org/packages/9a/33/801653083cead159ab9a1150bfd1352b3361684fd109c3b79100688c81da/cloudmesh_cc-5.0.3-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f9611e8b1e5a7e79486cdcceca0e22597bd6f1b382f1edb84098a415fe469d0b",
"md5": "e8284de90b8ef05447ef0b978cc38271",
"sha256": "dc04f2cb04fd64687d43ef52624311fcc5c4c61f1ff535d6e96228e1bfdc9788"
},
"downloads": -1,
"filename": "cloudmesh-cc-5.0.3.tar.gz",
"has_sig": false,
"md5_digest": "e8284de90b8ef05447ef0b978cc38271",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 77346,
"upload_time": "2023-12-23T05:40:00",
"upload_time_iso_8601": "2023-12-23T05:40:00.314849Z",
"url": "https://files.pythonhosted.org/packages/f9/61/1e8b1e5a7e79486cdcceca0e22597bd6f1b382f1edb84098a415fe469d0b/cloudmesh-cc-5.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-12-23 05:40:00",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "cloudmesh",
"github_project": "cloudmesh-cc",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "cloudmesh-cc"
}