microcosm-sagemaker

Name	microcosm-sagemaker JSON
Version	2023.3.3503 JSON
	download
home_page	https://github.com/globality-corp/microcosm-sagemaker
Summary	Opinionated machine learning organization and configuration
upload_time	2023-01-18 11:18:24
maintainer
docs_url	None
author	Globality Engineering
requires_python	>=3.6
license
keywords	microcosm
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # microcosm-sagemaker
Opinionated machine learning with SageMaker

## Usage
For best practices, see
[`cookiecutter-microcosm-sagemaker`](https://github.com/globality-corp/cookiecutter-microcosm-sagemaker).

## Profiling
Make sure `pyinstrument` is installed, either using `pip install pyinstrument` or by installing `microcosm-sagemaker` with `profiling` extra dependencies:

```
pip install -e '.[profiling]'
```

To enable profiling of the app, use the `--profile` flag with `runserver`:

```
runserver --profile
```

The service will log that it is in profiling mode and announce the directory to which it is exporting. Each call to the endpoint will be profiled and its results with be stored in a time-tagged html file in the profiling directory.

## Experiment Tracking
To use `Weights and Biases`, install `microcosm-sagemaker` with `wandb` extra depdency:

```
pip install -e '.[wandb]'
```

To enable experiment tracking in an ML repository:

* Choose the experiment tracking stores for your ML model. Currently, we only support `wandb`. To do so, add `wandb` to `graph.use()` in `app_hooks/train/app.py` and `app_hooks/evaluate/app.py`.

* Add the API key for `wandb` to the environment variables injected by Circle CI into the docker instance, by visiting `https://circleci.com/gh/globality-corp/<MODEL-NAME>/edit#env-vars` and adding `WANDB_API_KEY` as an environment variable.

* `Microcosm-sagemaker` automatically adds the config for the active bundle and its dependents to the `wandb`'s run config.

* To report a static metric:

```
class MyClassifier(Bundle):
    ...

    def fit(self, input_data):
        ...
        self.experiment_metrics.log_static(<metric_name>=<metric_value>)
```

* To report a time-series metric:

```
class MyClassifier(Bundle):
    ...

    def fit(self, input_data):
        ...
        self.experiment_metrics.log_timeseries(
            <metric_name>=<metric_value>,
            step=<step_number>
        )
```

Note that the `step` keyword argument must be provided for logging time-series.

## Artifact Tests

If you want to report your artifact tests to wandb, add the following line to the top of your `conftest.py`. 
For more information on using plugins in pytest, see [here](https://docs.pytest.org/en/6.2.x/plugins.html#requiring-loading-plugins-in-a-test-module-or-conftest-file).

```
pytest_plugins = 'pytest_sagemaker'
```

It should be generated by `globality-build`, but in case it is not, also make sure to run artifact
 tests with `--capture=tee-sys`. This will allow to both capture and show stdout.

## Reproducibility

As recommended [here](https://pytorch.org/docs/stable/notes/randomness.html), we seed Python, 
 Numpy and Pytorch random number generators and force Pytorch operations to be deterministic. See 
`microcosm-sagemaker/random.py` for details.

## Distributed training

To support "distributed" training - with multiple processes (like pytorch `DistributedDataParallel`),
we detect if the current process is a "worker process" (non-master member of a process group).
Worker processes are prevented from communicating with the outside world - writing logs and
saving artifacts.

## Training Bundles on Separate Processes

In some circumstances, it is beneficial for the training of a bundle to take place on a separate 
process. An example of this is when using DDP (or a DDP-related mechanism such as DeepSpeed) for multi-GPU 
training. This training mode may be enbled by setting the `spawn_to_fit` property of the `Bundle` to true
(note that the main process with block while the spawned process runs `fit` -- the purpose is to allow for 
more efficient parallelism in the training of each individual bundle, not for the concurrent training of 
multiple bundles).

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/globality-corp/microcosm-sagemaker",
    "name": "microcosm-sagemaker",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "microcosm",
    "author": "Globality Engineering",
    "author_email": "engineering@globality.com",
    "download_url": "https://files.pythonhosted.org/packages/a2/55/b442995af25ee0289425d5152a72e92a204fa544a667a8acfe86366a23db/microcosm-sagemaker-2023.3.3503.tar.gz",
    "platform": null,
    "description": "# microcosm-sagemaker\nOpinionated machine learning with SageMaker\n\n## Usage\nFor best practices, see\n[`cookiecutter-microcosm-sagemaker`](https://github.com/globality-corp/cookiecutter-microcosm-sagemaker).\n\n## Profiling\nMake sure `pyinstrument` is installed, either using `pip install pyinstrument` or by installing `microcosm-sagemaker` with `profiling` extra dependencies:\n\n```\npip install -e '.[profiling]'\n```\n\nTo enable profiling of the app, use the `--profile` flag with `runserver`:\n\n```\nrunserver --profile\n```\n\nThe service will log that it is in profiling mode and announce the directory to which it is exporting. Each call to the endpoint will be profiled and its results with be stored in a time-tagged html file in the profiling directory.\n\n## Experiment Tracking\nTo use `Weights and Biases`, install `microcosm-sagemaker` with `wandb` extra depdency:\n\n```\npip install -e '.[wandb]'\n```\n\nTo enable experiment tracking in an ML repository:\n\n* Choose the experiment tracking stores for your ML model. Currently, we only support `wandb`. To do so, add `wandb` to `graph.use()` in `app_hooks/train/app.py` and `app_hooks/evaluate/app.py`.\n\n* Add the API key for `wandb` to the environment variables injected by Circle CI into the docker instance, by visiting `https://circleci.com/gh/globality-corp/<MODEL-NAME>/edit#env-vars` and adding `WANDB_API_KEY` as an environment variable.\n\n* `Microcosm-sagemaker` automatically adds the config for the active bundle and its dependents to the `wandb`'s run config.\n\n* To report a static metric:\n\n```\nclass MyClassifier(Bundle):\n    ...\n\n    def fit(self, input_data):\n        ...\n        self.experiment_metrics.log_static(<metric_name>=<metric_value>)\n```\n\n* To report a time-series metric:\n\n```\nclass MyClassifier(Bundle):\n    ...\n\n    def fit(self, input_data):\n        ...\n        self.experiment_metrics.log_timeseries(\n            <metric_name>=<metric_value>,\n            step=<step_number>\n        )\n```\n\nNote that the `step` keyword argument must be provided for logging time-series.\n\n## Artifact Tests\n\nIf you want to report your artifact tests to wandb, add the following line to the top of your `conftest.py`. \nFor more information on using plugins in pytest, see [here](https://docs.pytest.org/en/6.2.x/plugins.html#requiring-loading-plugins-in-a-test-module-or-conftest-file).\n\n```\npytest_plugins = 'pytest_sagemaker'\n```\n\nIt should be generated by `globality-build`, but in case it is not, also make sure to run artifact\n tests with `--capture=tee-sys`. This will allow to both capture and show stdout.\n\n## Reproducibility\n\nAs recommended [here](https://pytorch.org/docs/stable/notes/randomness.html), we seed Python, \n Numpy and Pytorch random number generators and force Pytorch operations to be deterministic. See \n`microcosm-sagemaker/random.py` for details.\n\n## Distributed training\n\nTo support \"distributed\" training - with multiple processes (like pytorch `DistributedDataParallel`),\nwe detect if the current process is a \"worker process\" (non-master member of a process group).\nWorker processes are prevented from communicating with the outside world - writing logs and\nsaving artifacts.\n\n## Training Bundles on Separate Processes\n\nIn some circumstances, it is beneficial for the training of a bundle to take place on a separate \nprocess. An example of this is when using DDP (or a DDP-related mechanism such as DeepSpeed) for multi-GPU \ntraining. This training mode may be enbled by setting the `spawn_to_fit` property of the `Bundle` to true\n(note that the main process with block while the spawned process runs `fit` -- the purpose is to allow for \nmore efficient parallelism in the training of each individual bundle, not for the concurrent training of \nmultiple bundles).\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Opinionated machine learning organization and configuration",
    "version": "2023.3.3503",
    "split_keywords": [
        "microcosm"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a255b442995af25ee0289425d5152a72e92a204fa544a667a8acfe86366a23db",
                "md5": "7285705d4f60951bfdeafbe308af41bd",
                "sha256": "90d5f07d1d28e13cae7f5fad8d12fd89cfb6c2d700ca22c8f58c6c6d662b2c07"
            },
            "downloads": -1,
            "filename": "microcosm-sagemaker-2023.3.3503.tar.gz",
            "has_sig": false,
            "md5_digest": "7285705d4f60951bfdeafbe308af41bd",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 32731,
            "upload_time": "2023-01-18T11:18:24",
            "upload_time_iso_8601": "2023-01-18T11:18:24.741398Z",
            "url": "https://files.pythonhosted.org/packages/a2/55/b442995af25ee0289425d5152a72e92a204fa544a667a8acfe86366a23db/microcosm-sagemaker-2023.3.3503.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-18 11:18:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "globality-corp",
    "github_project": "microcosm-sagemaker",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "circle": true,
    "tox": true,
    "lcname": "microcosm-sagemaker"
}

Globality Engineering