fsmirror


Namefsmirror JSON
Version 0.4 PyPI version JSON
download
home_pagehttps://github.com/wesmadrigal/fsmirror
SummaryA metadata management package based on filesystem mirroring.
upload_time2024-02-28 02:15:36
maintainer
docs_urlNone
authorWes Madrigal
requires_python
licenseMIT
keywords metadata management filesystems
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # fsmirror

## Installation
```python
pip install fsmirror
```

### Functionality
Mirror project filesystems for metadata tracking.  It can be useful to have 
a direct path mirror between code that generates data and the location in a filesystem
or object store that stores the data / artifacts it generates.

### Example
code lives at: <br>
`project/etl/my_etl_task.py::LiftDataTask`
`fsmirror` output for associated: <br>
`project/etl/my_etl_task/LiftDataTask/out.parquet`
`fsmirror` s3 output for associated: <br>
`s3://my.bucket/project/etl/my_etl_task/LiftDataTask.out.parquet`


### Usage

* Create a configuration file like the one in `examples/example_config.yml`
* Set the config path:
```bash
export FSMIRROR_CONFIG_PATH=/your/project/path/config.yml`
```

The config file should look like the example:
```yaml
# artifacts
storage:
  # local, s3, gcs, blob
  provider: s3
  # root file path, bucket, etc.
  tenant: test.bucket
  # prefix - if 'MIRROR' will mirror filesystem
  namespace: MIRROR


# Each mirror should be a subdirectory
# within your project for example your
# orchestrator codebase lives at the
# following path:
#
# /opt/orchestrator
#
# To mirror this subdirectory we would
# add an "orchestrator" mirror as is
# done below
mirrors:
  fsmirror:
    # directory or subdirectory to split on
    root: fsmirror
    prefix: MIRROR
    output_name: out
    output_format: parquet

  aipipeline:
    root: aipipeline
    prefix: MIRROR
    output_name: out
    output_format: pkl
```

Use `fsmirror` for managing where to store artifacts, the following pseudocode is
an example of how it should be used:

```python
>>> from test_mirror import SomeTask, some_task
>>> from fsmirror import FSMirror, load_config
>>> load_config()
{'storage': {'provider': 's3', 'tenant': 'test.bucket', 'namespace': 'MIRROR'}, 'mirrors': {'fsmirror': {'root': 'fsmirror', 'prefix': 'MIRROR', 'output_name': 'out', 'output_format': 'parquet'}, 'aipipeline': {'root': 'aipipeline', 'prefix': 'MIRROR', 'output_name': 'out', 'output_format': 'pkl'}}}
>>> config = load_config()
>>> fm = FSMirror(config=config, mirror='fsmirror')
>>> fm.mirror_relative(some_task)
'fsmirror/tests/test_mirror/20240227160221/some_task'
>>> fm.mirror_relative(some_task, with_id=False)
'fsmirror/tests/test_mirror/some_task'
>>> fm.mirror_full(some_task)
's3://test.bucket/fsmirror/tests/test_mirror/20240227160221/some_task'
>>> fm.mirror_full_output(some_task)
's3://test.bucket/fsmirror/tests/test_mirror/20240227160221/some_task/out.parquet'
```



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/wesmadrigal/fsmirror",
    "name": "fsmirror",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "metadata management,filesystems",
    "author": "Wes Madrigal",
    "author_email": "wes@kurve.ai",
    "download_url": "https://files.pythonhosted.org/packages/5d/7d/083e6a3209da7fa3a8695024667fc8846174a53ee5e0baca0b773421bd7a/fsmirror-0.4.tar.gz",
    "platform": null,
    "description": "# fsmirror\n\n## Installation\n```python\npip install fsmirror\n```\n\n### Functionality\nMirror project filesystems for metadata tracking.  It can be useful to have \na direct path mirror between code that generates data and the location in a filesystem\nor object store that stores the data / artifacts it generates.\n\n### Example\ncode lives at: <br>\n`project/etl/my_etl_task.py::LiftDataTask`\n`fsmirror` output for associated: <br>\n`project/etl/my_etl_task/LiftDataTask/out.parquet`\n`fsmirror` s3 output for associated: <br>\n`s3://my.bucket/project/etl/my_etl_task/LiftDataTask.out.parquet`\n\n\n### Usage\n\n* Create a configuration file like the one in `examples/example_config.yml`\n* Set the config path:\n```bash\nexport FSMIRROR_CONFIG_PATH=/your/project/path/config.yml`\n```\n\nThe config file should look like the example:\n```yaml\n# artifacts\nstorage:\n  # local, s3, gcs, blob\n  provider: s3\n  # root file path, bucket, etc.\n  tenant: test.bucket\n  # prefix - if 'MIRROR' will mirror filesystem\n  namespace: MIRROR\n\n\n# Each mirror should be a subdirectory\n# within your project for example your\n# orchestrator codebase lives at the\n# following path:\n#\n# /opt/orchestrator\n#\n# To mirror this subdirectory we would\n# add an \"orchestrator\" mirror as is\n# done below\nmirrors:\n  fsmirror:\n    # directory or subdirectory to split on\n    root: fsmirror\n    prefix: MIRROR\n    output_name: out\n    output_format: parquet\n\n  aipipeline:\n    root: aipipeline\n    prefix: MIRROR\n    output_name: out\n    output_format: pkl\n```\n\nUse `fsmirror` for managing where to store artifacts, the following pseudocode is\nan example of how it should be used:\n\n```python\n>>> from test_mirror import SomeTask, some_task\n>>> from fsmirror import FSMirror, load_config\n>>> load_config()\n{'storage': {'provider': 's3', 'tenant': 'test.bucket', 'namespace': 'MIRROR'}, 'mirrors': {'fsmirror': {'root': 'fsmirror', 'prefix': 'MIRROR', 'output_name': 'out', 'output_format': 'parquet'}, 'aipipeline': {'root': 'aipipeline', 'prefix': 'MIRROR', 'output_name': 'out', 'output_format': 'pkl'}}}\n>>> config = load_config()\n>>> fm = FSMirror(config=config, mirror='fsmirror')\n>>> fm.mirror_relative(some_task)\n'fsmirror/tests/test_mirror/20240227160221/some_task'\n>>> fm.mirror_relative(some_task, with_id=False)\n'fsmirror/tests/test_mirror/some_task'\n>>> fm.mirror_full(some_task)\n's3://test.bucket/fsmirror/tests/test_mirror/20240227160221/some_task'\n>>> fm.mirror_full_output(some_task)\n's3://test.bucket/fsmirror/tests/test_mirror/20240227160221/some_task/out.parquet'\n```\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A metadata management package based on filesystem mirroring.",
    "version": "0.4",
    "project_urls": {
        "Homepage": "https://github.com/wesmadrigal/fsmirror",
        "Issue Tracker": "https://github.com/wesmadrigal/fsmirror/issues",
        "Source": "http://github.com/wesmadrigal/fsmirror"
    },
    "split_keywords": [
        "metadata management",
        "filesystems"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5d7d083e6a3209da7fa3a8695024667fc8846174a53ee5e0baca0b773421bd7a",
                "md5": "f222d13a61b7a29af3db3bd260e09746",
                "sha256": "dfa75e2f019fd991f24ace1201ac8b1ff8546b1623f893f52858f59cad1975d5"
            },
            "downloads": -1,
            "filename": "fsmirror-0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "f222d13a61b7a29af3db3bd260e09746",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 5001,
            "upload_time": "2024-02-28T02:15:36",
            "upload_time_iso_8601": "2024-02-28T02:15:36.496172Z",
            "url": "https://files.pythonhosted.org/packages/5d/7d/083e6a3209da7fa3a8695024667fc8846174a53ee5e0baca0b773421bd7a/fsmirror-0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-28 02:15:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "wesmadrigal",
    "github_project": "fsmirror",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "fsmirror"
}
        
Elapsed time: 0.22906s