Name | stow JSON |
Version |
1.4.1
JSON |
| download |
home_page | None |
Summary | stow artefacts anywhere, with ease |
upload_time | 2024-04-25 09:47:15 |
maintainer | None |
docs_url | None |
author | Kieran Bacon |
requires_python | None |
license | None |
keywords |
aws
s3
boto3
ssh
os
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
|
# Stow
`stow` is a package that supercharges your interactions with files and directories, and enables you to write filesystem agnostic code. With `stow` you can access and manipulate local and remote artefacts seamlessly with a rich and familiar interface. `stow` gives abstraction from storage implementations and solves compatibility issues, allowing code to be highly flexible.
`stow` is meant to be a drop in replacement for the `os.path` module, providing full coverage of its interface. Furthermore, `stow` extends the interface to work with remote files and directories and to include methods that follow conventional artefact manipulation paradigms like `put`, `get`, `ls`, `rm`, in a concise and highly functional manner.
```python
import stow
for art in stow.ls():
print(art)
# <stow.Directory: /home/kieran/stow/.pytest_cache>
# <stow.Directory: /home/kieran/stow/tests>
# <stow.File: /home/kieran/stow/mkdocs.yml modified(2020-06-27 10:24:10.909885) size(68 bytes)>
# <stow.File: /home/kieran/stow/requirements.txt modified(2020-05-25 14:00:59.423165) size(16 bytes)>
# ...
with stow.open("requirements.txt", "r") as handle:
print(handle.read())
# tqdm
# pyini
# boto3
stow.put("requirements.txt", "s3://example-bucket/projects/stow/requirements.txt")
with stow.open("s3://example-bucket/projects/stow/requirements.txt", "r") as handle:
print(handle.read())
# tqdm
# pyini
# boto3
print(stow.getmtime("s3://example-bucket/projects/stow/requirements.txt"))
# 1617381185.341602
```
## Why use stow?
`stow` offers advantages for developers who work locally, and those that work remotely. `stow` aims to simply and empower all interactions with files and directories, solving many of the problems that you see project to project. Tasks such as filtering directories, accessing file metadata, recursively searching for files, are now as easy as you'd expect them to be.
<p role="code-header">For example, this...</p>
```python
import os
import shutil
import datetime
source = 'path'
destination = 'path'
recent = datetime.datetime(2021, 5, 4)
for root, dirs, files in os.walk(source):
for name in files:
filepath = os.path.join(root, name)
modifiedTime = datetime.datetime.fromtimestamp(os.path.getmtime(filepath))
if modifiedTime > recent:
shutil.cp(filepath, os.path.join(destination, os.path.relpath(filepath, source))
```
<p role="code-header">will become this...</p>
```python
import stow
import datetime
source = 'path'
destination = 'path'
recent = datetime.datetime(2021, 5, 4)
for file in stow.ls(source, recursive=True):
if isinstance(file, stow.File) and file.modifiedTime > recent:
stow.cp(file, stow.join(destination, stow.relpath(file, source)))
```
**However**, the ultimate power that `stow` provides is the time saving and confidence brought by removing the need to write complicated methods for handling multiple backend storage solutions in your application.
Especially when you consider effort spent supporting the various stages of an applications development cycle, to then simply abandon good work when only a particular implementation is used live. (Yes, preferably all those stages are identical, but, this is never the case).
**You shouldn't be focusing on storage management, you should be focusing on your solutions**
Consider the following scenario: As part of a development team, you have been asked to write the code that handles the loading of application configuration, and you've been sent a few json files. You create and test a method that reads in the configuration files, and passes them on to the next step in your application.
This works perfectly fine locally, but, it turns out that the application is going to be deployed as a docker container running in AWS ecs. The configuration files will need to be hosted on s3 and accessed by the container on startup.
Well, you have to write a different method that uses `boto3` to connect to the bucket and pull them out. You setup a test bucket and an application IAM user with optimistic permissions to test your new method with, and get cracking.
You'll then have to add in some logic before this section in the application to handle the possibility of reading the files locally or remotely. This may come in the form of changes to your cli, api, etc, so you do that.
Then from up high, word comes that some of the configuration you are doing will need to change dynamically while the application is running. Your team has decided that the app will monitor one of the configuration files for changes and reload it when it does.
To maintain the local and remote duality of your application, you get to work updating both methods to check for updates, and then test.
**so what have you achieved?** Sad to say, very little. You've spent a lot of time getting up to speed with `boto3` (or re-implementing work from another project), and then you dived back into the deep end trying to understand how to get the modified time of files out. You've supported two methods for the same thing, when only one is going to be used. **You've loaded in some files.**
<p role="code-header">An example solution using <code>stow</code></p>
```python
import stow
import json
import datetime
import typing
def loadInConfigs(configDirectory: str) -> dict:
""" Open and parser the system configurations
Args:
configDirectory: The path to the config directory
Returns:
dict: A diction of configuration names to values
Raises:
FileNotFoundError: if the configDirectory path does not exist
"""
with stow.open(stow.join(configDirectory, "config1.json"), "r") as handle:
config1 = json.load(handle)
with stow.open(stow.join(configDirectory, "config2.json"), "r") as handle:
config2 = json.load(handle)
with stow.open(stow.join(configDirectory, "config3.json"), "r") as handle:
config3 = json.load(handle)
combined = {"lazers": config1, "cannons": config2, "doors": config3}
return combined
def reloadConfigIfUpdated(configPath: str, time: datetime = None) -> typing.Union[dict, None]:
""" Fetch and return config if it has been updated """
if time is None or stow.artefact(configPath).modifiedTime > time:
with stow.open(configPath) as handle:
return json.load(handle)
# Demonstrate how the function is called with different managers
configs = loadInConfigs('/local/app/configs') # local
configs = loadInConfigs('s3://organisation/project/team/live/app/configs') # S3
configs = loadInConfigs('ssh://admin:password@.../configs') # SSH
```
And with that you can handle configurations files being stored locally, on s3, on another container. Simple yet powerful.
## Installation
You can get stow by:
```bash
$ pip install stow
$ pip install stow==1.0.0
```
To use `stow`, simply import the package and begin to use its rich interface
```python
import stow
stow.ls()
```
!!! Note
The latest development version can always be found on [GitHub](https://github.com/Kieran-Bacon/stow){target=_blank}.
For best results, please ensure your version of Python is up-to-date. For more information on how to get the latest version of Python, please refer to the official [Python documentation](https://www.python.org/downloads/){target=_blank}.
## Paths, Artefacts, and Managers
<p role="list-header"> Conceptually, <code>stow</code> defines two fundamental objects:</p>
- `Artefact` - A storage object such as a file or directory; and
- `Manager` - An orchestration object for a storage implementation such as s3
**_Paths_** do not have their own object, paths are represented as strings. `Artefacts` wraps files and directories and provides an interface to interact with storage items directly. `Managers` privately define how certain actions will be carried out on a given storage implementation, which is then accced through a generic public interface. This provides the necessary level of abstraction so that your application code can be data agnostic.
### Paths
`stow` doesn't implement a __*path*__ object and instead uses strings just as any `os.path` method would. However, `Artefact` objects are [**path-like**](https://docs.python.org/3/glossary.html#term-path-like-object){target=_blank} which means they will be compatible with `os` methods just as a path object from `pathlib` would be.
```python
>>> import os
>>> import stow
>>> stow.join('/workspace', 'stow')
'/workspace/stow'
>>> os.path.join(stow.artefact('/workspace'), 'stow')
'/workspace/stow'
# On windows
>>> os.path.join(stow.artefact(), 'bin')
'c:\\Users\\kieran\\Projects\\Personal\\stow\\bin'
```
!!! Warning
Remote artefacts will not be _available_ for use by `os` methods (hence this package) so you are encouraged to use `stow` methods. All `os.path` methods are available on the top level of `stow`
Importantly, `stow` handles paths to remote files just as smoothly as any local file, power you cannot get anywhere else.
```python
>>> stow.join('s3://example-bucket/data', 'data.csv')
's3://example-bucket/data/data.csv'
>>> stow.getmtime('s3://example-bucket/data/data.csv')
1617381185.341602
```
### Artefacts
An `Artefact` represents a storage object which is then subclassed into `stow.File` and `stow.Directory`. These objects provide convenient methods for accessing their contents and extracting relevant metadata. `Artefact` objects are created just in time to serve a request and act as pointers to the local/remote objects. File contents is not downloaded until a explicit method to do so is called.
```python
for artefact in stow.ls('~'):
if isinstance(artefact, stow.Directory):
for file in artefact.ls():
print(file)
else:
print(artefact)
home = stow.artefact('~')
print(home['file.txt'].content) # Explicit call to get the file's contents
```
All `Artefact` objects belong to a `Manager` which orchestrates communication between your session and the storage medium. `Artefacts` are not storage implementation aware, and draw on the public interface of the manager object they belong to to provide their functionality. This point will become important when considering extending stow to an additional storage implementation.
!!! Important
`Artefact` objects are not guaranteed to exist! Read below
As you can hold onto references to `Artefacts` after they have been deleted (either via the stow interface or another method), you can end up attempting to access information for items that no longer exist. Any interaction with an `Artefact` will inform you if that is the case.
```python
>>> file = stow.artefact('~/file.txt')
>>> file.delete()
>>> file.content
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\kieran\Projects\Personal\stow\stow\artefacts.py", line 35, in __getattribute__
raise exceptions.ArtefactNoLongerExists(
stow.exceptions.ArtefactNoLongerExists: Artefact <class 'stow.artefacts.File'> /Users/kieran/file.txt no longer exists
```
That being said, updates, overwrites, copies, and move operations will update the `Artefact` object accordingly, assuming the path exists and the locations is of the same type.
### Managers
`Manager` objects represent a specific storage medium, and they will orchestrate communication between your active interpreter and the storage provider. They all adhere to a rich `Manager` interface which includes definitions for all of the `os.path` methods.
`Manager` objects are created behind the scene for many of `stows` stateless methods to process those calls. To avoid multiple definitions for the same storage providers, `Manager` objects are cached. `Managers` initialised directly will not be cached. It is encouraged to make use of the `Manager` cache by initialising `Managers` using the following methods `stow.find`, `stow.connect`, and `stow.parseURL`.
`Managers` do not expect to process protocols and path params when they are being used directly. `Managers` will internally use the unix style path standard for displaying and creating `Artefacts` paths. This means that a valid path is valid for all `Managers`.
```python
>>> manager = stow.connect(manager='s3', bucket='example-bucket')
>>> manager['/directory/file1.txt']
<stow.File: /directory/file1.txt modified(2021-04-07 18:14:11.473302+00:00) size(0 bytes)>
>>> stow.artefact('s3://example-bucket/directory/file1.txt')
<stow.File: /directory/file1.txt modified(2021-04-07 18:14:11.473302+00:00) size(0 bytes)>
```
!!! Note
You can completely forget about `Managers`! The stateless interface is sufficiently expressive to do everything you would need, without having to create a `Manager` object. From a users perspective, they have a very limited beneficial use case, one such use case is shown below.
Since `Managers` hold information about their storage provider and want to use valid paths, you can define methods to use the `Manager` objects with the simplified path and have that work across multiple backends.
```python
def managerAgnosticMethod(manager: stow.Manager):
# do stuff
with manager.open('/specific/file/path') as handle:
# do more stuff..
s3 = stow.connect(manager='s3', bucket='example-bucket')
ssh = stow.connect(manager='ssh', host='ec2....', username='ubuntu', ...)
managerAgnosticMethod(s3)
managerAgnosticMethod(ssh)
```
In the example, we have specified a path inside our function and given no consideration to what backend we may be using. The `Manager` passed will interpret the path relative to itself. This would be as opposed to simply constructing that path with the stateless interface.
```python
def managerAgnosticMethod(base: str):
# do stuff
with stow.open(stow.join(base, 'specific/file/path')) as handle:
# do more stuff..
managerAgnosticMethod("s3://example-bucket")
managerAgnosticMethod("ssh://ubuntu:***@ec2...../home/ubuntu")
```
As the `Managers` interface is just as extensive and feature-full as the stateless interface, either method would be appropriate. The `Manager` method as described will likely lead to fewer lines being written in the general case, but, it comes with the cost of having to understand what a `Manager` object is.
## Ways of working
### Ensuring artefacts
A lot of packages in python require that artefacts be local, because they interact with them directly. `stow` provides you with the ability to use these methods with remote objects by `localising` the objects before their use.
```python
with stow.localise('/home/ubuntu/image.jpg') as abspath:
cv2.imwrite(abspath, framedata)
with stow.localise('s3://bucket/image.jpg') as abspath:
cv2.imwrite(abspath, framedata)
with stow.localise('ssh://Host/bucket/image.jpg') as abspath:
cv2.imwrite(abspath, framedata)
```
A `localised` object will be addressable on disk, and any changes to the object will be pushed to the remote instance when the context is closed. For local artefacts, the context value will simply be the absolute path to that artefact.
**It may be better to think about localising as setting a link between a local path and a remote one**, because the remote path does not have to exist at the point of `localisation`. `stow` will inspect the artefact once the context is closed and handle it accordingly.
```python
import stow
with stow.localise("s3://example-bucket") as abspath:
stow.mkdir(abspath) # make path a directory iff path does not exist
# Do some work in a base directory
for i in range(10):
with open(os.path.join(abspath, str(i)), "w") as handle:
handle.write(f"line {i}")
```
!!! Note
AWS credentials were setup for the user via one of the methods that can be read about <a href="https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html" target="_blank">__*here*__</a>. This allows `stow` to be able to communicate with s3 simply by using the qualified url for the artefacts. Otherwise, the IAM secret keys are required to be passed to the manager as keyword arguments which can be looked at in [managers](managers).
### No direct communication between remote managers
`Artefacts` that are being moved between different remote managers, will be downloaded and then pushed up onto the destination manager. Though you might imagine that some managers (`ssh`) could directly write to the destination, it is not currently supported (and most managers never will be able to do this).
When moving `Artefacts` around a single remote manager, operations such as `mv` and `cp` should take place solely on the remote machine and should not be downloaded, but, this will be down to the api of the storage medium.
**Be aware that you will need to have storage available for these types of transfers**.
```python
import stow
stow.cp("s3://example-bucket/here","s3://different-bucket/here")
for art in stow.ls("ssh://ubuntu@ec2../files/here"):
if isinstance(art, stow.File):
stow.put(art, "s3://example-bucket/instance/")
```
### Dealing with added latency
Working with remote file systems will incur noticeable amounts of latency (in comparison to local artefacts) which many pose a problem for a system. To reduce this increased IO time, you will need to improve your connectivity to the remote manager, and cut down on the number of operations you are performing.
This second point is something we can address in our programs directly, and it's a good habit even when working explicitly with local files. You should try to minimise the number of read write functions you have to make, and program to push and pull data from the remote as little as possible.
!!! Note
Read and write operations are not the same as reading metadata/listing directories. These operations are extremely cheap to execute and values are cached whenever possible.
Some `managers` may be able to push and pull multiple `artefacts` more efficiently if they can do it in a single request. By `localising` directories, we can effectively bulk download and upload `artefacts`.
Furthermore, once `localised`, interactions with `files` and `directories` is lightening fast as they will be local objects. Reading from, writing to and appending won't require communication to the remote manager.
```python
import stow
# Five push of files to s3
for i in range(5):
with stow.open("s3://example-bucket/files/{}".format(i), "w") as handle:
handle.write(i)
# One push of directory of five files
with stow.localise("s3://example-bucket/files", mode="a") as abspath:
for i in range(5):
with stow.open(stow.join(abspath, str(i)), "w") as handle:
handle.write(i)
```
Caveats to this approach:
- This bulking method requires the files you are working on to touch the local file system before they are pushed to the remote, so if local storage is a scarce resource then this approach may not be feasible.
- The performance of the bulk upload is dependent on the availability of the underlying backend. If the storage provider doesn't provide a utility for bulk uploading then there isn't an improvement to be had.
- Network usage will be grouped at a single point (exiting of the localise context) in your program flow.
Raw data
{
"_id": null,
"home_page": null,
"name": "stow",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "aws s3 boto3 ssh os",
"author": "Kieran Bacon",
"author_email": "kieran.bacon@outlook.com",
"download_url": "https://files.pythonhosted.org/packages/a2/9b/dd2fc7081de22ad04b2321f63cbfd0244e2e19186173a25abdc8dba1eba4/stow-1.4.1.tar.gz",
"platform": null,
"description": "# Stow\r\n\r\n`stow` is a package that supercharges your interactions with files and directories, and enables you to write filesystem agnostic code. With `stow` you can access and manipulate local and remote artefacts seamlessly with a rich and familiar interface. `stow` gives abstraction from storage implementations and solves compatibility issues, allowing code to be highly flexible.\r\n\r\n`stow` is meant to be a drop in replacement for the `os.path` module, providing full coverage of its interface. Furthermore, `stow` extends the interface to work with remote files and directories and to include methods that follow conventional artefact manipulation paradigms like `put`, `get`, `ls`, `rm`, in a concise and highly functional manner.\r\n\r\n```python\r\nimport stow\r\n\r\nfor art in stow.ls():\r\n print(art)\r\n# <stow.Directory: /home/kieran/stow/.pytest_cache>\r\n# <stow.Directory: /home/kieran/stow/tests>\r\n# <stow.File: /home/kieran/stow/mkdocs.yml modified(2020-06-27 10:24:10.909885) size(68 bytes)>\r\n# <stow.File: /home/kieran/stow/requirements.txt modified(2020-05-25 14:00:59.423165) size(16 bytes)>\r\n# ...\r\n\r\nwith stow.open(\"requirements.txt\", \"r\") as handle:\r\n print(handle.read())\r\n# tqdm\r\n# pyini\r\n# boto3\r\n\r\n\r\nstow.put(\"requirements.txt\", \"s3://example-bucket/projects/stow/requirements.txt\")\r\n\r\nwith stow.open(\"s3://example-bucket/projects/stow/requirements.txt\", \"r\") as handle:\r\n print(handle.read())\r\n# tqdm\r\n# pyini\r\n# boto3\r\n\r\nprint(stow.getmtime(\"s3://example-bucket/projects/stow/requirements.txt\"))\r\n# 1617381185.341602\r\n```\r\n\r\n## Why use stow?\r\n\r\n`stow` offers advantages for developers who work locally, and those that work remotely. `stow` aims to simply and empower all interactions with files and directories, solving many of the problems that you see project to project. Tasks such as filtering directories, accessing file metadata, recursively searching for files, are now as easy as you'd expect them to be.\r\n\r\n<p role=\"code-header\">For example, this...</p>\r\n\r\n```python\r\nimport os\r\nimport shutil\r\nimport datetime\r\n\r\nsource = 'path'\r\ndestination = 'path'\r\nrecent = datetime.datetime(2021, 5, 4)\r\n\r\nfor root, dirs, files in os.walk(source):\r\n\r\n for name in files:\r\n filepath = os.path.join(root, name)\r\n modifiedTime = datetime.datetime.fromtimestamp(os.path.getmtime(filepath))\r\n\r\n if modifiedTime > recent:\r\n shutil.cp(filepath, os.path.join(destination, os.path.relpath(filepath, source))\r\n```\r\n\r\n<p role=\"code-header\">will become this...</p>\r\n\r\n```python\r\nimport stow\r\nimport datetime\r\n\r\nsource = 'path'\r\ndestination = 'path'\r\nrecent = datetime.datetime(2021, 5, 4)\r\n\r\nfor file in stow.ls(source, recursive=True):\r\n if isinstance(file, stow.File) and file.modifiedTime > recent:\r\n stow.cp(file, stow.join(destination, stow.relpath(file, source)))\r\n```\r\n\r\n**However**, the ultimate power that `stow` provides is the time saving and confidence brought by removing the need to write complicated methods for handling multiple backend storage solutions in your application.\r\n\r\nEspecially when you consider effort spent supporting the various stages of an applications development cycle, to then simply abandon good work when only a particular implementation is used live. (Yes, preferably all those stages are identical, but, this is never the case).\r\n\r\n**You shouldn't be focusing on storage management, you should be focusing on your solutions**\r\n\r\nConsider the following scenario: As part of a development team, you have been asked to write the code that handles the loading of application configuration, and you've been sent a few json files. You create and test a method that reads in the configuration files, and passes them on to the next step in your application.\r\n\r\nThis works perfectly fine locally, but, it turns out that the application is going to be deployed as a docker container running in AWS ecs. The configuration files will need to be hosted on s3 and accessed by the container on startup.\r\n\r\nWell, you have to write a different method that uses `boto3` to connect to the bucket and pull them out. You setup a test bucket and an application IAM user with optimistic permissions to test your new method with, and get cracking.\r\n\r\nYou'll then have to add in some logic before this section in the application to handle the possibility of reading the files locally or remotely. This may come in the form of changes to your cli, api, etc, so you do that.\r\n\r\nThen from up high, word comes that some of the configuration you are doing will need to change dynamically while the application is running. Your team has decided that the app will monitor one of the configuration files for changes and reload it when it does.\r\n\r\nTo maintain the local and remote duality of your application, you get to work updating both methods to check for updates, and then test.\r\n\r\n**so what have you achieved?** Sad to say, very little. You've spent a lot of time getting up to speed with `boto3` (or re-implementing work from another project), and then you dived back into the deep end trying to understand how to get the modified time of files out. You've supported two methods for the same thing, when only one is going to be used. **You've loaded in some files.**\r\n\r\n<p role=\"code-header\">An example solution using <code>stow</code></p>\r\n\r\n```python\r\nimport stow\r\nimport json\r\nimport datetime\r\nimport typing\r\n\r\ndef loadInConfigs(configDirectory: str) -> dict:\r\n \"\"\" Open and parser the system configurations\r\n\r\n Args:\r\n configDirectory: The path to the config directory\r\n\r\n Returns:\r\n dict: A diction of configuration names to values\r\n\r\n Raises:\r\n FileNotFoundError: if the configDirectory path does not exist\r\n \"\"\"\r\n\r\n with stow.open(stow.join(configDirectory, \"config1.json\"), \"r\") as handle:\r\n config1 = json.load(handle)\r\n\r\n with stow.open(stow.join(configDirectory, \"config2.json\"), \"r\") as handle:\r\n config2 = json.load(handle)\r\n\r\n with stow.open(stow.join(configDirectory, \"config3.json\"), \"r\") as handle:\r\n config3 = json.load(handle)\r\n\r\n combined = {\"lazers\": config1, \"cannons\": config2, \"doors\": config3}\r\n\r\n return combined\r\n\r\ndef reloadConfigIfUpdated(configPath: str, time: datetime = None) -> typing.Union[dict, None]:\r\n \"\"\" Fetch and return config if it has been updated \"\"\"\r\n\r\n if time is None or stow.artefact(configPath).modifiedTime > time:\r\n with stow.open(configPath) as handle:\r\n return json.load(handle)\r\n\r\n# Demonstrate how the function is called with different managers\r\nconfigs = loadInConfigs('/local/app/configs') # local\r\nconfigs = loadInConfigs('s3://organisation/project/team/live/app/configs') # S3\r\nconfigs = loadInConfigs('ssh://admin:password@.../configs') # SSH\r\n```\r\n\r\nAnd with that you can handle configurations files being stored locally, on s3, on another container. Simple yet powerful.\r\n\r\n## Installation\r\n\r\nYou can get stow by:\r\n\r\n```bash\r\n$ pip install stow\r\n$ pip install stow==1.0.0\r\n```\r\n\r\nTo use `stow`, simply import the package and begin to use its rich interface\r\n\r\n```python\r\nimport stow\r\n\r\nstow.ls()\r\n```\r\n\r\n!!! Note\r\n The latest development version can always be found on [GitHub](https://github.com/Kieran-Bacon/stow){target=_blank}.\r\n\r\n For best results, please ensure your version of Python is up-to-date. For more information on how to get the latest version of Python, please refer to the official [Python documentation](https://www.python.org/downloads/){target=_blank}.\r\n\r\n## Paths, Artefacts, and Managers\r\n\r\n<p role=\"list-header\"> Conceptually, <code>stow</code> defines two fundamental objects:</p>\r\n- `Artefact` - A storage object such as a file or directory; and\r\n- `Manager` - An orchestration object for a storage implementation such as s3\r\n\r\n**_Paths_** do not have their own object, paths are represented as strings. `Artefacts` wraps files and directories and provides an interface to interact with storage items directly. `Managers` privately define how certain actions will be carried out on a given storage implementation, which is then accced through a generic public interface. This provides the necessary level of abstraction so that your application code can be data agnostic.\r\n\r\n### Paths\r\n\r\n`stow` doesn't implement a __*path*__ object and instead uses strings just as any `os.path` method would. However, `Artefact` objects are [**path-like**](https://docs.python.org/3/glossary.html#term-path-like-object){target=_blank} which means they will be compatible with `os` methods just as a path object from `pathlib` would be.\r\n\r\n```python\r\n>>> import os\r\n>>> import stow\r\n>>> stow.join('/workspace', 'stow')\r\n'/workspace/stow'\r\n>>> os.path.join(stow.artefact('/workspace'), 'stow')\r\n'/workspace/stow'\r\n\r\n# On windows\r\n>>> os.path.join(stow.artefact(), 'bin')\r\n'c:\\\\Users\\\\kieran\\\\Projects\\\\Personal\\\\stow\\\\bin'\r\n```\r\n\r\n!!! Warning\r\n Remote artefacts will not be _available_ for use by `os` methods (hence this package) so you are encouraged to use `stow` methods. All `os.path` methods are available on the top level of `stow`\r\n\r\nImportantly, `stow` handles paths to remote files just as smoothly as any local file, power you cannot get anywhere else.\r\n\r\n```python\r\n>>> stow.join('s3://example-bucket/data', 'data.csv')\r\n's3://example-bucket/data/data.csv'\r\n>>> stow.getmtime('s3://example-bucket/data/data.csv')\r\n1617381185.341602\r\n```\r\n\r\n### Artefacts\r\n\r\nAn `Artefact` represents a storage object which is then subclassed into `stow.File` and `stow.Directory`. These objects provide convenient methods for accessing their contents and extracting relevant metadata. `Artefact` objects are created just in time to serve a request and act as pointers to the local/remote objects. File contents is not downloaded until a explicit method to do so is called.\r\n\r\n```python\r\nfor artefact in stow.ls('~'):\r\n if isinstance(artefact, stow.Directory):\r\n for file in artefact.ls():\r\n print(file)\r\n\r\n else:\r\n print(artefact)\r\n\r\nhome = stow.artefact('~')\r\nprint(home['file.txt'].content) # Explicit call to get the file's contents\r\n```\r\n\r\nAll `Artefact` objects belong to a `Manager` which orchestrates communication between your session and the storage medium. `Artefacts` are not storage implementation aware, and draw on the public interface of the manager object they belong to to provide their functionality. This point will become important when considering extending stow to an additional storage implementation.\r\n\r\n!!! Important\r\n `Artefact` objects are not guaranteed to exist! Read below\r\n\r\nAs you can hold onto references to `Artefacts` after they have been deleted (either via the stow interface or another method), you can end up attempting to access information for items that no longer exist. Any interaction with an `Artefact` will inform you if that is the case.\r\n\r\n```python\r\n>>> file = stow.artefact('~/file.txt')\r\n>>> file.delete()\r\n>>> file.content\r\nTraceback (most recent call last):\r\n File \"<stdin>\", line 1, in <module>\r\n File \"C:\\Users\\kieran\\Projects\\Personal\\stow\\stow\\artefacts.py\", line 35, in __getattribute__\r\n raise exceptions.ArtefactNoLongerExists(\r\nstow.exceptions.ArtefactNoLongerExists: Artefact <class 'stow.artefacts.File'> /Users/kieran/file.txt no longer exists\r\n```\r\n\r\nThat being said, updates, overwrites, copies, and move operations will update the `Artefact` object accordingly, assuming the path exists and the locations is of the same type.\r\n### Managers\r\n\r\n`Manager` objects represent a specific storage medium, and they will orchestrate communication between your active interpreter and the storage provider. They all adhere to a rich `Manager` interface which includes definitions for all of the `os.path` methods.\r\n\r\n`Manager` objects are created behind the scene for many of `stows` stateless methods to process those calls. To avoid multiple definitions for the same storage providers, `Manager` objects are cached. `Managers` initialised directly will not be cached. It is encouraged to make use of the `Manager` cache by initialising `Managers` using the following methods `stow.find`, `stow.connect`, and `stow.parseURL`.\r\n\r\n`Managers` do not expect to process protocols and path params when they are being used directly. `Managers` will internally use the unix style path standard for displaying and creating `Artefacts` paths. This means that a valid path is valid for all `Managers`.\r\n\r\n```python\r\n>>> manager = stow.connect(manager='s3', bucket='example-bucket')\r\n>>> manager['/directory/file1.txt']\r\n<stow.File: /directory/file1.txt modified(2021-04-07 18:14:11.473302+00:00) size(0 bytes)>\r\n>>> stow.artefact('s3://example-bucket/directory/file1.txt')\r\n<stow.File: /directory/file1.txt modified(2021-04-07 18:14:11.473302+00:00) size(0 bytes)>\r\n```\r\n\r\n!!! Note\r\n You can completely forget about `Managers`! The stateless interface is sufficiently expressive to do everything you would need, without having to create a `Manager` object. From a users perspective, they have a very limited beneficial use case, one such use case is shown below.\r\n\r\nSince `Managers` hold information about their storage provider and want to use valid paths, you can define methods to use the `Manager` objects with the simplified path and have that work across multiple backends.\r\n\r\n```python\r\ndef managerAgnosticMethod(manager: stow.Manager):\r\n\r\n # do stuff\r\n\r\n with manager.open('/specific/file/path') as handle:\r\n # do more stuff..\r\n\r\n\r\ns3 = stow.connect(manager='s3', bucket='example-bucket')\r\nssh = stow.connect(manager='ssh', host='ec2....', username='ubuntu', ...)\r\n\r\nmanagerAgnosticMethod(s3)\r\nmanagerAgnosticMethod(ssh)\r\n```\r\n\r\nIn the example, we have specified a path inside our function and given no consideration to what backend we may be using. The `Manager` passed will interpret the path relative to itself. This would be as opposed to simply constructing that path with the stateless interface.\r\n\r\n```python\r\ndef managerAgnosticMethod(base: str):\r\n\r\n # do stuff\r\n\r\n with stow.open(stow.join(base, 'specific/file/path')) as handle:\r\n # do more stuff..\r\n\r\nmanagerAgnosticMethod(\"s3://example-bucket\")\r\nmanagerAgnosticMethod(\"ssh://ubuntu:***@ec2...../home/ubuntu\")\r\n```\r\n\r\nAs the `Managers` interface is just as extensive and feature-full as the stateless interface, either method would be appropriate. The `Manager` method as described will likely lead to fewer lines being written in the general case, but, it comes with the cost of having to understand what a `Manager` object is.\r\n\r\n## Ways of working\r\n\r\n### Ensuring artefacts\r\n\r\nA lot of packages in python require that artefacts be local, because they interact with them directly. `stow` provides you with the ability to use these methods with remote objects by `localising` the objects before their use.\r\n\r\n```python\r\nwith stow.localise('/home/ubuntu/image.jpg') as abspath:\r\n cv2.imwrite(abspath, framedata)\r\n\r\nwith stow.localise('s3://bucket/image.jpg') as abspath:\r\n cv2.imwrite(abspath, framedata)\r\n\r\nwith stow.localise('ssh://Host/bucket/image.jpg') as abspath:\r\n cv2.imwrite(abspath, framedata)\r\n```\r\n\r\nA `localised` object will be addressable on disk, and any changes to the object will be pushed to the remote instance when the context is closed. For local artefacts, the context value will simply be the absolute path to that artefact.\r\n\r\n**It may be better to think about localising as setting a link between a local path and a remote one**, because the remote path does not have to exist at the point of `localisation`. `stow` will inspect the artefact once the context is closed and handle it accordingly.\r\n\r\n```python\r\nimport stow\r\n\r\nwith stow.localise(\"s3://example-bucket\") as abspath:\r\n stow.mkdir(abspath) # make path a directory iff path does not exist\r\n\r\n # Do some work in a base directory\r\n for i in range(10):\r\n with open(os.path.join(abspath, str(i)), \"w\") as handle:\r\n handle.write(f\"line {i}\")\r\n```\r\n\r\n!!! Note\r\n AWS credentials were setup for the user via one of the methods that can be read about <a href=\"https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html\" target=\"_blank\">__*here*__</a>. This allows `stow` to be able to communicate with s3 simply by using the qualified url for the artefacts. Otherwise, the IAM secret keys are required to be passed to the manager as keyword arguments which can be looked at in [managers](managers).\r\n\r\n### No direct communication between remote managers\r\n\r\n`Artefacts` that are being moved between different remote managers, will be downloaded and then pushed up onto the destination manager. Though you might imagine that some managers (`ssh`) could directly write to the destination, it is not currently supported (and most managers never will be able to do this).\r\n\r\nWhen moving `Artefacts` around a single remote manager, operations such as `mv` and `cp` should take place solely on the remote machine and should not be downloaded, but, this will be down to the api of the storage medium.\r\n\r\n**Be aware that you will need to have storage available for these types of transfers**.\r\n\r\n```python\r\nimport stow\r\n\r\nstow.cp(\"s3://example-bucket/here\",\"s3://different-bucket/here\")\r\n\r\nfor art in stow.ls(\"ssh://ubuntu@ec2../files/here\"):\r\n if isinstance(art, stow.File):\r\n stow.put(art, \"s3://example-bucket/instance/\")\r\n```\r\n\r\n### Dealing with added latency\r\n\r\nWorking with remote file systems will incur noticeable amounts of latency (in comparison to local artefacts) which many pose a problem for a system. To reduce this increased IO time, you will need to improve your connectivity to the remote manager, and cut down on the number of operations you are performing.\r\n\r\nThis second point is something we can address in our programs directly, and it's a good habit even when working explicitly with local files. You should try to minimise the number of read write functions you have to make, and program to push and pull data from the remote as little as possible.\r\n\r\n!!! Note\r\n Read and write operations are not the same as reading metadata/listing directories. These operations are extremely cheap to execute and values are cached whenever possible.\r\n\r\nSome `managers` may be able to push and pull multiple `artefacts` more efficiently if they can do it in a single request. By `localising` directories, we can effectively bulk download and upload `artefacts`.\r\n\r\nFurthermore, once `localised`, interactions with `files` and `directories` is lightening fast as they will be local objects. Reading from, writing to and appending won't require communication to the remote manager.\r\n\r\n```python\r\nimport stow\r\n\r\n# Five push of files to s3\r\nfor i in range(5):\r\n with stow.open(\"s3://example-bucket/files/{}\".format(i), \"w\") as handle:\r\n handle.write(i)\r\n\r\n# One push of directory of five files\r\nwith stow.localise(\"s3://example-bucket/files\", mode=\"a\") as abspath:\r\n for i in range(5):\r\n with stow.open(stow.join(abspath, str(i)), \"w\") as handle:\r\n handle.write(i)\r\n```\r\n\r\nCaveats to this approach:\r\n\r\n- This bulking method requires the files you are working on to touch the local file system before they are pushed to the remote, so if local storage is a scarce resource then this approach may not be feasible.\r\n- The performance of the bulk upload is dependent on the availability of the underlying backend. If the storage provider doesn't provide a utility for bulk uploading then there isn't an improvement to be had.\r\n- Network usage will be grouped at a single point (exiting of the localise context) in your program flow.\r\n",
"bugtrack_url": null,
"license": null,
"summary": "stow artefacts anywhere, with ease",
"version": "1.4.1",
"project_urls": {
"Bug Tracker": "https://github.com/Kieran-Bacon/stow/issues",
"Documentation": "https://stow.readthedocs.io/en/latest/",
"Homepage": "https://github.com/Kieran-Bacon/stow"
},
"split_keywords": [
"aws",
"s3",
"boto3",
"ssh",
"os"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "25ee9ad5fb114561a6c08f51f9f13ad139ba11682ceebeebc1e1e08e5998284f",
"md5": "b5ea43e9f85bcadf7052a92e64d457d3",
"sha256": "a0373fc4fea1bbecf9db9bff8d5c5d30eb23e0890b88f2803ed682bc4ccaf6b5"
},
"downloads": -1,
"filename": "stow-1.4.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b5ea43e9f85bcadf7052a92e64d457d3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 80931,
"upload_time": "2024-04-25T09:47:12",
"upload_time_iso_8601": "2024-04-25T09:47:12.442951Z",
"url": "https://files.pythonhosted.org/packages/25/ee/9ad5fb114561a6c08f51f9f13ad139ba11682ceebeebc1e1e08e5998284f/stow-1.4.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a29bdd2fc7081de22ad04b2321f63cbfd0244e2e19186173a25abdc8dba1eba4",
"md5": "2eba0f54290dd6f6cb4b6e3747e22af0",
"sha256": "40b474a4d20bf9a75600dbf48949e228e94062d2188cd81c1380feb37428dde6"
},
"downloads": -1,
"filename": "stow-1.4.1.tar.gz",
"has_sig": false,
"md5_digest": "2eba0f54290dd6f6cb4b6e3747e22af0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 87574,
"upload_time": "2024-04-25T09:47:15",
"upload_time_iso_8601": "2024-04-25T09:47:15.954927Z",
"url": "https://files.pythonhosted.org/packages/a2/9b/dd2fc7081de22ad04b2321f63cbfd0244e2e19186173a25abdc8dba1eba4/stow-1.4.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-25 09:47:15",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Kieran-Bacon",
"github_project": "stow",
"travis_ci": false,
"coveralls": true,
"github_actions": false,
"lcname": "stow"
}