pyremotedata


Namepyremotedata JSON
Version 0.0.50 PyPI version JSON
download
home_pageNone
SummaryA package for low- and high-level high-bandwidth asynchronous data transfer
upload_time2025-07-10 08:24:40
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8.10
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # `pyRemoteData`
`pyRemoteData` is a module developed for scientific computation using the remote storage platform [ERDA](https://erda.au.dk/) (Electronic Research Data Archive) provided by Aarhus University IT, as part of my PhD at the Department of Ecoscience at Aarhus University.

It can be used with **any** storage facility that supports SFTP and LFTP, but is only tested on a minimal SFTP server found at [atmoz/sftp](https://hub.docker.com/r/atmoz/sftp) and on the live AU ERDA service which runs on MiG (Minimum intrusion Grid - [SourceForge](https://sourceforge.net/projects/migrid/)/[GitHub](https://github.com/ucphhpc/migrid-sync)) developed by [SCIENCE HPC Centre at Copenhagen University](https://science.ku.dk/english/research/research-e-infrastructure/science-hpc-centre/).

## Capabilities
In order to facility high-throughput computation in a cross-platform setting, `pyRemoteData` handles data transfer with multithreading and asynchronous data streaming using thread-safe buffers.

## Use-cases
If your storage facility supports SFTP and LFTP, and you need high-bandwidth data streaming for analysis, data migration or other purposes such as model-training, then this module may be of use to you.
Experience with SFTP or LFTP is not necessary, but you must be able to setup the required SSH configurations.

See **Automated** for details on how to avoid having to set up SSH configuration.

## Setup
A more user-friendly setup process, which facilitates both automated as well as interactive setup is currently in development. (**TODO**: Finish and describe the setup process)

### Installation
The package is available on PyPI, and can be installed using pip:
```bash
pip install pyremotedata
```

### Interactive
Simply follow the popup instructions that appear once you load the package for the first time.

### Automated
The automatic configuration setup relies on setting the correct environment variables **BEFORE LOADING THE PACKAGE**:

* `PYREMOTEDATA_REMOTE_USERNAME` : Should be set to your username on your remote service.
* `PYREMOTEDATA_REMOTE_URI` : Should be set to the URI of the endpoint for your remote service (e.g. for ERDA it is "io.erda.au.dk").
* `PYREMOTEDATA_REMOTE_DIRECTORY` : If you would like to set a default working directory, that is not the root of your remote storage, then set this to that (e.g. "/MY_PROJECT/DATASETS") otherwise simply set this to "/".
* `PYREMOTEDATA_AUTO` : Should be **set to "yes"** to disable interactive mode. If this is not set, or set to anything other than "yes" (not case-sensitive), while any of the prior environment variables are unset an error will be thrown.

The recommended way to avoid any SSH or environment variables setup is to use:
```py
from pyremotedata.implicit_mount import IOHandler
with IOHandler(lftp_settings = {'sftp:connect-program' : 'ssh -a -x -i <keyfile>'}, user = <USER>, remote = <REMOTE>) as io:
    ...
```
Here `keyfile` is probably something like `~/.ssh/id_rsa`. 

### Example
If you want to test against a mock server simply follow the instructions in tests/README.

If you have a remote storage facility that supports SFTP and LFTP, then you can use the following example to test the functionality of the module:
```python
# Set the environment variables (only necessary in a non-interactive setting)
# If you are simply running this as a Python script, 
# you can omit these lines and you will be prompted to set them interactively
import os
os.environ["PYREMOTEDATA_REMOTE_USERNAME"] = "username"
os.environ["PYREMOTEDATA_REMOTE_URI"] = "storage.example.com"
os.environ["PYREMOTEDATA_REMOTE_DIRECTORY"] = "/MY_PROJECT/DATASETS"
os.environ["PYREMOTEDATA_AUTO"] = "yes"

from pyremotedata.implicit_mount import IOHandler

handler = IOHandler()

with handler as io:
    print(io.ls())

# The configuration is persistent, but can be removed using the following:
from pyremotedata.config import remove_config
remove_config()
```

## Issues
This module is certainly not maximally efficient, and you may run into network- or OS-specific issues. Any and all feedback and contributions is highly appreciated.
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pyremotedata",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8.10",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "Asger Svenning <asgersvenning@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/56/27/3c0977a4cc0ad6ee64655172456a6ab60c371480e76b7e7037e25d12ebf6/pyremotedata-0.0.50.tar.gz",
    "platform": null,
    "description": "# `pyRemoteData`\n`pyRemoteData` is a module developed for scientific computation using the remote storage platform [ERDA](https://erda.au.dk/) (Electronic Research Data Archive) provided by Aarhus University IT, as part of my PhD at the Department of Ecoscience at Aarhus University.\n\nIt can be used with **any** storage facility that supports SFTP and LFTP, but is only tested on a minimal SFTP server found at [atmoz/sftp](https://hub.docker.com/r/atmoz/sftp) and on the live AU ERDA service which runs on MiG (Minimum intrusion Grid - [SourceForge](https://sourceforge.net/projects/migrid/)/[GitHub](https://github.com/ucphhpc/migrid-sync)) developed by [SCIENCE HPC Centre at Copenhagen University](https://science.ku.dk/english/research/research-e-infrastructure/science-hpc-centre/).\n\n## Capabilities\nIn order to facility high-throughput computation in a cross-platform setting, `pyRemoteData` handles data transfer with multithreading and asynchronous data streaming using thread-safe buffers.\n\n## Use-cases\nIf your storage facility supports SFTP and LFTP, and you need high-bandwidth data streaming for analysis, data migration or other purposes such as model-training, then this module may be of use to you.\nExperience with SFTP or LFTP is not necessary, but you must be able to setup the required SSH configurations.\n\nSee **Automated** for details on how to avoid having to set up SSH configuration.\n\n## Setup\nA more user-friendly setup process, which facilitates both automated as well as interactive setup is currently in development. (**TODO**: Finish and describe the setup process)\n\n### Installation\nThe package is available on PyPI, and can be installed using pip:\n```bash\npip install pyremotedata\n```\n\n### Interactive\nSimply follow the popup instructions that appear once you load the package for the first time.\n\n### Automated\nThe automatic configuration setup relies on setting the correct environment variables **BEFORE LOADING THE PACKAGE**:\n\n* `PYREMOTEDATA_REMOTE_USERNAME` : Should be set to your username on your remote service.\n* `PYREMOTEDATA_REMOTE_URI` : Should be set to the URI of the endpoint for your remote service (e.g. for ERDA it is \"io.erda.au.dk\").\n* `PYREMOTEDATA_REMOTE_DIRECTORY` : If you would like to set a default working directory, that is not the root of your remote storage, then set this to that (e.g. \"/MY_PROJECT/DATASETS\") otherwise simply set this to \"/\".\n* `PYREMOTEDATA_AUTO` : Should be **set to \"yes\"** to disable interactive mode. If this is not set, or set to anything other than \"yes\" (not case-sensitive), while any of the prior environment variables are unset an error will be thrown.\n\nThe recommended way to avoid any SSH or environment variables setup is to use:\n```py\nfrom pyremotedata.implicit_mount import IOHandler\nwith IOHandler(lftp_settings = {'sftp:connect-program' : 'ssh -a -x -i <keyfile>'}, user = <USER>, remote = <REMOTE>) as io:\n    ...\n```\nHere `keyfile` is probably something like `~/.ssh/id_rsa`. \n\n### Example\nIf you want to test against a mock server simply follow the instructions in tests/README.\n\nIf you have a remote storage facility that supports SFTP and LFTP, then you can use the following example to test the functionality of the module:\n```python\n# Set the environment variables (only necessary in a non-interactive setting)\n# If you are simply running this as a Python script, \n# you can omit these lines and you will be prompted to set them interactively\nimport os\nos.environ[\"PYREMOTEDATA_REMOTE_USERNAME\"] = \"username\"\nos.environ[\"PYREMOTEDATA_REMOTE_URI\"] = \"storage.example.com\"\nos.environ[\"PYREMOTEDATA_REMOTE_DIRECTORY\"] = \"/MY_PROJECT/DATASETS\"\nos.environ[\"PYREMOTEDATA_AUTO\"] = \"yes\"\n\nfrom pyremotedata.implicit_mount import IOHandler\n\nhandler = IOHandler()\n\nwith handler as io:\n    print(io.ls())\n\n# The configuration is persistent, but can be removed using the following:\nfrom pyremotedata.config import remove_config\nremove_config()\n```\n\n## Issues\nThis module is certainly not maximally efficient, and you may run into network- or OS-specific issues. Any and all feedback and contributions is highly appreciated.",
    "bugtrack_url": null,
    "license": null,
    "summary": "A package for low- and high-level high-bandwidth asynchronous data transfer",
    "version": "0.0.50",
    "project_urls": {
        "Bug Tracker": "https://github.com/asgersvenning/pyremotedata/issues",
        "Homepage": "https://github.com/asgersvenning/pyremotedata"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8cc4b6d4c706bf46cbf8b47296f884b657da7ef90d2783a0801118d001af45ff",
                "md5": "1af715739baa979fc63def0fd58878d4",
                "sha256": "95881cb93c8fefd319830736c6a4b51f38194c93220ce1168136ca80d53e9846"
            },
            "downloads": -1,
            "filename": "pyremotedata-0.0.50-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1af715739baa979fc63def0fd58878d4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8.10",
            "size": 27999,
            "upload_time": "2025-07-10T08:24:39",
            "upload_time_iso_8601": "2025-07-10T08:24:39.582341Z",
            "url": "https://files.pythonhosted.org/packages/8c/c4/b6d4c706bf46cbf8b47296f884b657da7ef90d2783a0801118d001af45ff/pyremotedata-0.0.50-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "56273c0977a4cc0ad6ee64655172456a6ab60c371480e76b7e7037e25d12ebf6",
                "md5": "58e5ee2c0fc171c57c7e395e3c9f0c5d",
                "sha256": "91669943df79a79030cf748a2e433eef6fa32a37cb854037d6be32e2a466c60e"
            },
            "downloads": -1,
            "filename": "pyremotedata-0.0.50.tar.gz",
            "has_sig": false,
            "md5_digest": "58e5ee2c0fc171c57c7e395e3c9f0c5d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8.10",
            "size": 935488,
            "upload_time": "2025-07-10T08:24:40",
            "upload_time_iso_8601": "2025-07-10T08:24:40.643069Z",
            "url": "https://files.pythonhosted.org/packages/56/27/3c0977a4cc0ad6ee64655172456a6ab60c371480e76b7e7037e25d12ebf6/pyremotedata-0.0.50.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-10 08:24:40",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "asgersvenning",
    "github_project": "pyremotedata",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pyremotedata"
}
        
Elapsed time: 0.41276s