Name | pyremotedata JSON |
Version |
0.0.36
JSON |
| download |
home_page | None |
Summary | A package for low- and high-level high-bandwidth asynchronous data transfer |
upload_time | 2024-11-05 09:43:02 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.8.10 |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# `pyRemoteData`
`pyRemoteData` is a module developed for scientific computation using the remote storage platform [ERDA](https://erda.au.dk/) (Electronic Research Data Archive) provided by Aarhus University IT, as part of my PhD at the Department of Ecoscience at Aarhus University.
It can be used with **any** passwordless SSH-enabled storage facility that supports SFTP and LFTP. But is only tested on a minimal SFTP server found at [atmoz/sftp](https://hub.docker.com/r/atmoz/sftp) and on the live AU ERDA service which runs on MiG (Minimum intrusion Grid - [SourceForge](https://sourceforge.net/projects/migrid/)/[GitHub](https://github.com/ucphhpc/migrid-sync)) developed by [SCIENCE HPC Centre at Copenhagen University](https://science.ku.dk/english/research/research-e-infrastructure/science-hpc-centre/).
If your facility requires a password, it should be very easy to modify the code to support this, in fact it is already implemented, but not exposed to the user.
Merely change line 76 in src/remote_data/implicit_mount.py to fetch the password from the environment variable of your choice, or simply hardcode it. However, do this at your own risk, as I have not assessed the security implications.
## Capabilities
In order to facility high-throughput computation in a cross-platform setting, `pyRemoteData` handles data transfer with multithreading and asynchronous data streaming using thread-safe buffers.
## Use-cases
If your storage facility supports SFTP and LFTP, and you need high-bandwidth data streaming for analysis, data migration or other purposes such as model-training, then this module may be of use to you.
Experience with SFTP or LFTP is not necessary, but you must be able to setup the required SSH configurations.
## Setup
A more user-friendly setup process, which facilitates both automated as well as interactive setup is currently in development. (**TODO**: Finish and describe the setup process)
### Installation
The package is available on PyPI, and can be installed using pip:
```bash
pip install pyremotedata
```
### Interactive
Simply follow the popup instructions that appear once you load the package for the first time.
### Automated
The automatic configuration setup relies on setting the correct environment variables **BEFORE LOADING THE PACKAGE**:
* `PYREMOTEDATA_REMOTE_USERNAME` : Should be set to your username on your remote service.
* `PYREMOTEDATA_REMOTE_URI` : Should be set to the URI of the endpoint for your remote service (e.g. for ERDA it is "io.erda.au.dk").
* `PYREMOTEDATA_REMOTE_DIRECTORY` : If you would like to set a default working directory, that is not the root of your remote storage, then set this to that (e.g. "/MY_PROJECT/DATASETS") otherwise simply set this to "/".
* `PYREMOTEDATA_AUTO` : Should be **set to "yes"** to disable interactive mode. If this is not set, or set to anything other than "yes" (not case-sensitive), while any of the prior environment variables are unset an error will be thrown.
### Example
If you want to test against a mock server simply follow the instructions in tests/README.
If you have a remote storage facility that supports SFTP and LFTP, then you can use the following example to test the functionality of the module:
```python
# Set the environment variables (only necessary in a non-interactive setting)
# If you are simply running this as a Python script,
# you can omit these lines and you will be prompted to set them interactively
import os
os.environ["PYREMOTEDATA_REMOTE_USERNAME"] = "username"
os.environ["PYREMOTEDATA_REMOTE_URI"] = "storage.example.com"
os.environ["PYREMOTEDATA_REMOTE_DIRECTORY"] = "/MY_PROJECT/DATASETS"
os.environ["PYREMOTEDATA_AUTO"] = "yes"
from pyremotedata.implicit_mount import IOHandler
handler = IOHandler()
with handler as io:
print(io.ls())
# The configuration is persistent, but can be removed using the following:
from pyremotedata.config import remove_config
remove_config()
```
## Issues
This module is certainly not maximally efficient, and you may run into network- or OS-specific issues. Any and all feedback and contributions is highly appreciated.
Raw data
{
"_id": null,
"home_page": null,
"name": "pyremotedata",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8.10",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": "Asger Svenning <asgersvenning@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/9d/cd/d79f39ab650118c04832d3811fcf6e74f4c9b3d3f634a903afcb837a5d69/pyremotedata-0.0.36.tar.gz",
"platform": null,
"description": "# `pyRemoteData`\n`pyRemoteData` is a module developed for scientific computation using the remote storage platform [ERDA](https://erda.au.dk/) (Electronic Research Data Archive) provided by Aarhus University IT, as part of my PhD at the Department of Ecoscience at Aarhus University.\n\nIt can be used with **any** passwordless SSH-enabled storage facility that supports SFTP and LFTP. But is only tested on a minimal SFTP server found at [atmoz/sftp](https://hub.docker.com/r/atmoz/sftp) and on the live AU ERDA service which runs on MiG (Minimum intrusion Grid - [SourceForge](https://sourceforge.net/projects/migrid/)/[GitHub](https://github.com/ucphhpc/migrid-sync)) developed by [SCIENCE HPC Centre at Copenhagen University](https://science.ku.dk/english/research/research-e-infrastructure/science-hpc-centre/).\n\nIf your facility requires a password, it should be very easy to modify the code to support this, in fact it is already implemented, but not exposed to the user.\nMerely change line 76 in src/remote_data/implicit_mount.py to fetch the password from the environment variable of your choice, or simply hardcode it. However, do this at your own risk, as I have not assessed the security implications.\n\n## Capabilities\nIn order to facility high-throughput computation in a cross-platform setting, `pyRemoteData` handles data transfer with multithreading and asynchronous data streaming using thread-safe buffers.\n\n## Use-cases\nIf your storage facility supports SFTP and LFTP, and you need high-bandwidth data streaming for analysis, data migration or other purposes such as model-training, then this module may be of use to you.\nExperience with SFTP or LFTP is not necessary, but you must be able to setup the required SSH configurations.\n\n## Setup\nA more user-friendly setup process, which facilitates both automated as well as interactive setup is currently in development. (**TODO**: Finish and describe the setup process)\n\n### Installation\nThe package is available on PyPI, and can be installed using pip:\n```bash\npip install pyremotedata\n```\n\n### Interactive\nSimply follow the popup instructions that appear once you load the package for the first time.\n\n### Automated\nThe automatic configuration setup relies on setting the correct environment variables **BEFORE LOADING THE PACKAGE**:\n\n* `PYREMOTEDATA_REMOTE_USERNAME` : Should be set to your username on your remote service.\n* `PYREMOTEDATA_REMOTE_URI` : Should be set to the URI of the endpoint for your remote service (e.g. for ERDA it is \"io.erda.au.dk\").\n* `PYREMOTEDATA_REMOTE_DIRECTORY` : If you would like to set a default working directory, that is not the root of your remote storage, then set this to that (e.g. \"/MY_PROJECT/DATASETS\") otherwise simply set this to \"/\".\n* `PYREMOTEDATA_AUTO` : Should be **set to \"yes\"** to disable interactive mode. If this is not set, or set to anything other than \"yes\" (not case-sensitive), while any of the prior environment variables are unset an error will be thrown.\n\n### Example\nIf you want to test against a mock server simply follow the instructions in tests/README.\n\nIf you have a remote storage facility that supports SFTP and LFTP, then you can use the following example to test the functionality of the module:\n```python\n# Set the environment variables (only necessary in a non-interactive setting)\n# If you are simply running this as a Python script, \n# you can omit these lines and you will be prompted to set them interactively\nimport os\nos.environ[\"PYREMOTEDATA_REMOTE_USERNAME\"] = \"username\"\nos.environ[\"PYREMOTEDATA_REMOTE_URI\"] = \"storage.example.com\"\nos.environ[\"PYREMOTEDATA_REMOTE_DIRECTORY\"] = \"/MY_PROJECT/DATASETS\"\nos.environ[\"PYREMOTEDATA_AUTO\"] = \"yes\"\n\nfrom pyremotedata.implicit_mount import IOHandler\n\nhandler = IOHandler()\n\nwith handler as io:\n print(io.ls())\n\n# The configuration is persistent, but can be removed using the following:\nfrom pyremotedata.config import remove_config\nremove_config()\n```\n\n## Issues\nThis module is certainly not maximally efficient, and you may run into network- or OS-specific issues. Any and all feedback and contributions is highly appreciated.",
"bugtrack_url": null,
"license": null,
"summary": "A package for low- and high-level high-bandwidth asynchronous data transfer",
"version": "0.0.36",
"project_urls": {
"Bug Tracker": "https://github.com/asgersvenning/pyremotedata/issues",
"Homepage": "https://github.com/asgersvenning/pyremotedata"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ce8cde5e331e2d6cbc0cd9dafed57ab02862f8dd48143d7bbd2af735230a3157",
"md5": "3841f9e8a7d6c5aef7c2b8cf07ccabcc",
"sha256": "ed286e29e6e89899ae687be74c7f8549ebf2429dbe266e71996ae5d7aa56770e"
},
"downloads": -1,
"filename": "pyremotedata-0.0.36-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3841f9e8a7d6c5aef7c2b8cf07ccabcc",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8.10",
"size": 27815,
"upload_time": "2024-11-05T09:43:00",
"upload_time_iso_8601": "2024-11-05T09:43:00.725125Z",
"url": "https://files.pythonhosted.org/packages/ce/8c/de5e331e2d6cbc0cd9dafed57ab02862f8dd48143d7bbd2af735230a3157/pyremotedata-0.0.36-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9dcdd79f39ab650118c04832d3811fcf6e74f4c9b3d3f634a903afcb837a5d69",
"md5": "87c4749311f16325a6d5927edc10f73d",
"sha256": "3863c1b77c02a0206777e8e3490ee83f9c2d38fdb68b4a5d00ed02015ccc1d31"
},
"downloads": -1,
"filename": "pyremotedata-0.0.36.tar.gz",
"has_sig": false,
"md5_digest": "87c4749311f16325a6d5927edc10f73d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8.10",
"size": 933009,
"upload_time": "2024-11-05T09:43:02",
"upload_time_iso_8601": "2024-11-05T09:43:02.178284Z",
"url": "https://files.pythonhosted.org/packages/9d/cd/d79f39ab650118c04832d3811fcf6e74f4c9b3d3f634a903afcb837a5d69/pyremotedata-0.0.36.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-05 09:43:02",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "asgersvenning",
"github_project": "pyremotedata",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "pyremotedata"
}