# fsmirror
## Installation
```python
pip install fsmirror
```
### Functionality
Mirror project filesystems for metadata tracking. It can be useful to have
a direct path mirror between code that generates data and the location in a filesystem
or object store that stores the data / artifacts it generates.
### Example
code lives at: <br>
`project/etl/my_etl_task.py::LiftDataTask`
`fsmirror` output for associated: <br>
`project/etl/my_etl_task/LiftDataTask/out.parquet`
`fsmirror` s3 output for associated: <br>
`s3://my.bucket/project/etl/my_etl_task/LiftDataTask.out.parquet`
### Usage
* Create a configuration file like the one in `examples/example_config.yml`
* Set the config path:
```bash
export FSMIRROR_CONFIG_PATH=/your/project/path/config.yml`
```
The config file should look like the example:
```yaml
# artifacts
storage:
# local, s3, gcs, blob
provider: s3
# root file path, bucket, etc.
tenant: test.bucket
# prefix - if 'MIRROR' will mirror filesystem
namespace: MIRROR
# Each mirror should be a subdirectory
# within your project for example your
# orchestrator codebase lives at the
# following path:
#
# /opt/orchestrator
#
# To mirror this subdirectory we would
# add an "orchestrator" mirror as is
# done below
mirrors:
fsmirror:
# directory or subdirectory to split on
root: fsmirror
prefix: MIRROR
output_name: out
output_format: parquet
aipipeline:
root: aipipeline
prefix: MIRROR
output_name: out
output_format: pkl
```
Use `fsmirror` for managing where to store artifacts, the following pseudocode is
an example of how it should be used:
```python
>>> from test_mirror import SomeTask, some_task
>>> from fsmirror import FSMirror, load_config
>>> load_config()
{'storage': {'provider': 's3', 'tenant': 'test.bucket', 'namespace': 'MIRROR'}, 'mirrors': {'fsmirror': {'root': 'fsmirror', 'prefix': 'MIRROR', 'output_name': 'out', 'output_format': 'parquet'}, 'aipipeline': {'root': 'aipipeline', 'prefix': 'MIRROR', 'output_name': 'out', 'output_format': 'pkl'}}}
>>> config = load_config()
>>> fm = FSMirror(config=config, mirror='fsmirror')
>>> fm.mirror_relative(some_task)
'fsmirror/tests/test_mirror/20240227160221/some_task'
>>> fm.mirror_relative(some_task, with_id=False)
'fsmirror/tests/test_mirror/some_task'
>>> fm.mirror_full(some_task)
's3://test.bucket/fsmirror/tests/test_mirror/20240227160221/some_task'
>>> fm.mirror_full_output(some_task)
's3://test.bucket/fsmirror/tests/test_mirror/20240227160221/some_task/out.parquet'
```
Raw data
{
"_id": null,
"home_page": "https://github.com/wesmadrigal/fsmirror",
"name": "fsmirror",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "metadata management,filesystems",
"author": "Wes Madrigal",
"author_email": "wes@kurve.ai",
"download_url": "https://files.pythonhosted.org/packages/5d/7d/083e6a3209da7fa3a8695024667fc8846174a53ee5e0baca0b773421bd7a/fsmirror-0.4.tar.gz",
"platform": null,
"description": "# fsmirror\n\n## Installation\n```python\npip install fsmirror\n```\n\n### Functionality\nMirror project filesystems for metadata tracking. It can be useful to have \na direct path mirror between code that generates data and the location in a filesystem\nor object store that stores the data / artifacts it generates.\n\n### Example\ncode lives at: <br>\n`project/etl/my_etl_task.py::LiftDataTask`\n`fsmirror` output for associated: <br>\n`project/etl/my_etl_task/LiftDataTask/out.parquet`\n`fsmirror` s3 output for associated: <br>\n`s3://my.bucket/project/etl/my_etl_task/LiftDataTask.out.parquet`\n\n\n### Usage\n\n* Create a configuration file like the one in `examples/example_config.yml`\n* Set the config path:\n```bash\nexport FSMIRROR_CONFIG_PATH=/your/project/path/config.yml`\n```\n\nThe config file should look like the example:\n```yaml\n# artifacts\nstorage:\n # local, s3, gcs, blob\n provider: s3\n # root file path, bucket, etc.\n tenant: test.bucket\n # prefix - if 'MIRROR' will mirror filesystem\n namespace: MIRROR\n\n\n# Each mirror should be a subdirectory\n# within your project for example your\n# orchestrator codebase lives at the\n# following path:\n#\n# /opt/orchestrator\n#\n# To mirror this subdirectory we would\n# add an \"orchestrator\" mirror as is\n# done below\nmirrors:\n fsmirror:\n # directory or subdirectory to split on\n root: fsmirror\n prefix: MIRROR\n output_name: out\n output_format: parquet\n\n aipipeline:\n root: aipipeline\n prefix: MIRROR\n output_name: out\n output_format: pkl\n```\n\nUse `fsmirror` for managing where to store artifacts, the following pseudocode is\nan example of how it should be used:\n\n```python\n>>> from test_mirror import SomeTask, some_task\n>>> from fsmirror import FSMirror, load_config\n>>> load_config()\n{'storage': {'provider': 's3', 'tenant': 'test.bucket', 'namespace': 'MIRROR'}, 'mirrors': {'fsmirror': {'root': 'fsmirror', 'prefix': 'MIRROR', 'output_name': 'out', 'output_format': 'parquet'}, 'aipipeline': {'root': 'aipipeline', 'prefix': 'MIRROR', 'output_name': 'out', 'output_format': 'pkl'}}}\n>>> config = load_config()\n>>> fm = FSMirror(config=config, mirror='fsmirror')\n>>> fm.mirror_relative(some_task)\n'fsmirror/tests/test_mirror/20240227160221/some_task'\n>>> fm.mirror_relative(some_task, with_id=False)\n'fsmirror/tests/test_mirror/some_task'\n>>> fm.mirror_full(some_task)\n's3://test.bucket/fsmirror/tests/test_mirror/20240227160221/some_task'\n>>> fm.mirror_full_output(some_task)\n's3://test.bucket/fsmirror/tests/test_mirror/20240227160221/some_task/out.parquet'\n```\n\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A metadata management package based on filesystem mirroring.",
"version": "0.4",
"project_urls": {
"Homepage": "https://github.com/wesmadrigal/fsmirror",
"Issue Tracker": "https://github.com/wesmadrigal/fsmirror/issues",
"Source": "http://github.com/wesmadrigal/fsmirror"
},
"split_keywords": [
"metadata management",
"filesystems"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "5d7d083e6a3209da7fa3a8695024667fc8846174a53ee5e0baca0b773421bd7a",
"md5": "f222d13a61b7a29af3db3bd260e09746",
"sha256": "dfa75e2f019fd991f24ace1201ac8b1ff8546b1623f893f52858f59cad1975d5"
},
"downloads": -1,
"filename": "fsmirror-0.4.tar.gz",
"has_sig": false,
"md5_digest": "f222d13a61b7a29af3db3bd260e09746",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 5001,
"upload_time": "2024-02-28T02:15:36",
"upload_time_iso_8601": "2024-02-28T02:15:36.496172Z",
"url": "https://files.pythonhosted.org/packages/5d/7d/083e6a3209da7fa3a8695024667fc8846174a53ee5e0baca0b773421bd7a/fsmirror-0.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-28 02:15:36",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "wesmadrigal",
"github_project": "fsmirror",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "fsmirror"
}