osml-model-runner

Name	osml-model-runner JSON
Version	2.3.0 JSON
	download
home_page	None
Summary	Application to run large scale imagery against AI/ML models
upload_time	2024-11-26 20:00:29
maintainer	None
docs_url	None
author	Amazon Web Services
requires_python	>=3.8
license	MIT No Attribution Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage

            # OSML Model Runner

This package contains an application used to orchestrate the execution of ML models on large satellite images. The
application monitors an input queue for processing requests, decomposes the image into a set of smaller regions and
tiles, invokes an ML model endpoint with each tile, and finally aggregates all the results into a single output. The
application itself has been containerized and is designed to run on a distributed cluster of machines collaborating
across instances to process images as quickly as possible.

### Table of Contents
* [Getting Started](#getting-started)
  * [Key Design Concepts](#key-design-concepts)
    * [Image Tiling](#image-tiling)
    * [Geolocation](#geolocation)
    * [Merging Results from Overlap Regions](#merging-results-from-overlap-regions)
  * [Package Layout](#package-layout)
  * [Prerequisites](prerequisites)
  * [Development Environment](#development-environment)
  * [Running ModelRunner](#running-modelrunner)
  * [Infrastructure](#infrastructure)
    * [S3](#s3)
  * [Documentation](#documentation)
* [Support & Feedback](#support--feedback)
* [Security](#security)
* [License](#license)


## Getting Started

### Key Design Concepts

The [Guidance for Model Developers](./GUIDE_FOR_MODEL_DEVELOPERS.md) document contains details of how the
OversightML ModelRunner applications interacts with containerized computer vision (CV) models and examples of the
GeoJSON formatted inputs it expects and generates. At a high level this application provides the following functions:

#### Image Tiling

The images to be processed by this application are expected to range anywhere from 500MB to 500GB in size. The upper
bound is consistently growing as sensors become increasingly capable of collecting larger swaths of high resolution
data. To handle these images the application applies two levels of tiling. The first is region based tiling in which the
application breaks the full image up into pieces that are small enough for a single machine to handle. All regions after
the first are placed on a second queue so other model runners can start processing those regions in parallel. The second
tiling phase is to break each region up into individual chunks that will be sent to the ML models. Many ML model
containers are configured to process images that are between 512 and 2048 pixels in size so the full processing of a
large 200,000 x 200,000 satellite image can result in >10,000 requests to those model endpoints.

The images themselves are assumed to reside in S3 and are assumed to be compressed and encoded in such a way as to
facilitate piecewise access to tiles without downloading the entire image. The GDAL library, a frequently used open
source implementation of GIS data tools, has the ability to read images directly from S3 making use of partial range
reads to only download the part of the overall image necessary to process the region.

#### Geolocation

Most ML models do not contain the photogrammetry libraries needed to geolocate objects detected in an
image. ModelRunner will convert these detections into geospatial features by using sensor models described
in an image metadata. The details of the photogrammetry operations are in the
[osml-imagery-toolkit](https://github.com/aws-solutions-library-samples/osml-imagery-toolkit) library.

#### Merging Results from Overlap Regions

Many of the ML algorithms we expect to run will involve object detection or feature extraction. It is possible that
features of interest would fall on the tile boundaries and therefore be missed by the ML models because they are only
seeing a fractional object. This application mitigates that by allowing requests to specify an overlap region size that
should be tuned to the expected size of the objects. Each tile sent to the ML model will be cut from the full image
overlapping the previous by the specified amount. Then the results from each tile are aggregated with the aid of a
Non-Maximal Suppression algorithm used to eliminate duplicates in cases where an object in an overlap region was picked
up by multiple model runs.

### Metrics and Logs

As the application runs key performance metrics and detailed logging information are output to [CloudWatch](https://aws.amazon.com/cloudwatch/).
A detailed description of what information is tracked along with example dashboards can be found in
[METRICS_AND_DASHBOARDS.md](./METRICS_AND_DASHBOARDS.md).

### Package Layout

* **/src**: This is the Python implementation of this application.
* **/test**: Unit tests have been implemented using [pytest](https://docs.pytest.org).
* **/bin**: The entry point for the containerized application.
* **/scripts**: Utility scripts that are not part of the main application frequently used in development / testing.

### Prerequisites

First, ensure you have installed the following tools locally

- [docker](https://nodejs.org/en)
- [tox](https://tox.wiki/en/latest/installation.html)
- [osml cdk](https://github.com/aws-solutions-library-samples/osml-cdk-constructs) deployed into your aws account

### Development Environment

To run the container in a build/test mode and work inside it.

```shell
docker run -it -v `pwd`/:/home/ --entrypoint /bin/bash .
```

### Running ModelRunner

To start a job, place an ImageRequest on the ImageRequestQueue.

Sample ImageRequest:
```json
{
    "jobName": "<job_name>",
    "jobId": "<job_id>",
    "imageUrls": ["<image_url>"],
    "outputs": [
        {"type": "S3", "bucket": "<result_bucket_arn>", "prefix": "<job_name>/"},
        {"type": "Kinesis", "stream": "<result_stream_arn>", "batchSize": 1000}
    ],
    "imageProcessor": {"name": "<sagemaker_endpoint>", "type": "SM_ENDPOINT"},
    "imageProcessorTileSize": 2048,
    "imageProcessorTileOverlap": 50,
    "imageProcessorTileFormat": "< NITF | JPEG | PNG | GTIFF >",
    "imageProcessorTileCompression": "< NONE | JPEG | J2K | LZW >"
}
```

### Infrastructure

#### S3
When configuring S3 buckets for images and results, be sure to follow [S3 Security Best Practices](https://docs.aws.amazon.com/AmazonS3/latest/userguide/security-best-practices.html).

### Code Documentation

You can find documentation for this library in the `./doc` directory. Sphinx is used to construct a searchable HTML
version of the API documents.

```shell
tox -e docs
```

## Support & Feedback

To post feedback, submit feature ideas, or report bugs, please use the [Issues](https://github.com/aws-solutions-library-samples/osml-model-runner/issues) section of this GitHub repo.

If you are interested in contributing to OversightML Model Runner, see the [CONTRIBUTING](CONTRIBUTING.md) guide.

## Security

See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.

## License

MIT No Attribution Licensed. See [LICENSE](LICENSE).

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "osml-model-runner",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "Amazon Web Services",
    "author_email": "aws-osml-admin@amazon.com",
    "download_url": "https://files.pythonhosted.org/packages/f9/05/3e25aad42fdd00b3963100a90c4abe81b85383afe91031219eae6b76c4a9/osml_model_runner-2.3.0.tar.gz",
    "platform": null,
    "description": "# OSML Model Runner\n\nThis package contains an application used to orchestrate the execution of ML models on large satellite images. The\napplication monitors an input queue for processing requests, decomposes the image into a set of smaller regions and\ntiles, invokes an ML model endpoint with each tile, and finally aggregates all the results into a single output. The\napplication itself has been containerized and is designed to run on a distributed cluster of machines collaborating\nacross instances to process images as quickly as possible.\n\n### Table of Contents\n* [Getting Started](#getting-started)\n  * [Key Design Concepts](#key-design-concepts)\n    * [Image Tiling](#image-tiling)\n    * [Geolocation](#geolocation)\n    * [Merging Results from Overlap Regions](#merging-results-from-overlap-regions)\n  * [Package Layout](#package-layout)\n  * [Prerequisites](prerequisites)\n  * [Development Environment](#development-environment)\n  * [Running ModelRunner](#running-modelrunner)\n  * [Infrastructure](#infrastructure)\n    * [S3](#s3)\n  * [Documentation](#documentation)\n* [Support & Feedback](#support--feedback)\n* [Security](#security)\n* [License](#license)\n\n\n## Getting Started\n\n### Key Design Concepts\n\nThe [Guidance for Model Developers](./GUIDE_FOR_MODEL_DEVELOPERS.md) document contains details of how the\nOversightML ModelRunner applications interacts with containerized computer vision (CV) models and examples of the\nGeoJSON formatted inputs it expects and generates. At a high level this application provides the following functions:\n\n#### Image Tiling\n\nThe images to be processed by this application are expected to range anywhere from 500MB to 500GB in size. The upper\nbound is consistently growing as sensors become increasingly capable of collecting larger swaths of high resolution\ndata. To handle these images the application applies two levels of tiling. The first is region based tiling in which the\napplication breaks the full image up into pieces that are small enough for a single machine to handle. All regions after\nthe first are placed on a second queue so other model runners can start processing those regions in parallel. The second\ntiling phase is to break each region up into individual chunks that will be sent to the ML models. Many ML model\ncontainers are configured to process images that are between 512 and 2048 pixels in size so the full processing of a\nlarge 200,000 x 200,000 satellite image can result in >10,000 requests to those model endpoints.\n\nThe images themselves are assumed to reside in S3 and are assumed to be compressed and encoded in such a way as to\nfacilitate piecewise access to tiles without downloading the entire image. The GDAL library, a frequently used open\nsource implementation of GIS data tools, has the ability to read images directly from S3 making use of partial range\nreads to only download the part of the overall image necessary to process the region.\n\n#### Geolocation\n\nMost ML models do not contain the photogrammetry libraries needed to geolocate objects detected in an\nimage. ModelRunner will convert these detections into geospatial features by using sensor models described\nin an image metadata. The details of the photogrammetry operations are in the\n[osml-imagery-toolkit](https://github.com/aws-solutions-library-samples/osml-imagery-toolkit) library.\n\n#### Merging Results from Overlap Regions\n\nMany of the ML algorithms we expect to run will involve object detection or feature extraction. It is possible that\nfeatures of interest would fall on the tile boundaries and therefore be missed by the ML models because they are only\nseeing a fractional object. This application mitigates that by allowing requests to specify an overlap region size that\nshould be tuned to the expected size of the objects. Each tile sent to the ML model will be cut from the full image\noverlapping the previous by the specified amount. Then the results from each tile are aggregated with the aid of a\nNon-Maximal Suppression algorithm used to eliminate duplicates in cases where an object in an overlap region was picked\nup by multiple model runs.\n\n### Metrics and Logs\n\nAs the application runs key performance metrics and detailed logging information are output to [CloudWatch](https://aws.amazon.com/cloudwatch/).\nA detailed description of what information is tracked along with example dashboards can be found in\n[METRICS_AND_DASHBOARDS.md](./METRICS_AND_DASHBOARDS.md).\n\n### Package Layout\n\n* **/src**: This is the Python implementation of this application.\n* **/test**: Unit tests have been implemented using [pytest](https://docs.pytest.org).\n* **/bin**: The entry point for the containerized application.\n* **/scripts**: Utility scripts that are not part of the main application frequently used in development / testing.\n\n### Prerequisites\n\nFirst, ensure you have installed the following tools locally\n\n- [docker](https://nodejs.org/en)\n- [tox](https://tox.wiki/en/latest/installation.html)\n- [osml cdk](https://github.com/aws-solutions-library-samples/osml-cdk-constructs) deployed into your aws account\n\n### Development Environment\n\nTo run the container in a build/test mode and work inside it.\n\n```shell\ndocker run -it -v `pwd`/:/home/ --entrypoint /bin/bash .\n```\n\n### Running ModelRunner\n\nTo start a job, place an ImageRequest on the ImageRequestQueue.\n\nSample ImageRequest:\n```json\n{\n    \"jobName\": \"<job_name>\",\n    \"jobId\": \"<job_id>\",\n    \"imageUrls\": [\"<image_url>\"],\n    \"outputs\": [\n        {\"type\": \"S3\", \"bucket\": \"<result_bucket_arn>\", \"prefix\": \"<job_name>/\"},\n        {\"type\": \"Kinesis\", \"stream\": \"<result_stream_arn>\", \"batchSize\": 1000}\n    ],\n    \"imageProcessor\": {\"name\": \"<sagemaker_endpoint>\", \"type\": \"SM_ENDPOINT\"},\n    \"imageProcessorTileSize\": 2048,\n    \"imageProcessorTileOverlap\": 50,\n    \"imageProcessorTileFormat\": \"< NITF | JPEG | PNG | GTIFF >\",\n    \"imageProcessorTileCompression\": \"< NONE | JPEG | J2K | LZW >\"\n}\n```\n\n### Infrastructure\n\n#### S3\nWhen configuring S3 buckets for images and results, be sure to follow [S3 Security Best Practices](https://docs.aws.amazon.com/AmazonS3/latest/userguide/security-best-practices.html).\n\n### Code Documentation\n\nYou can find documentation for this library in the `./doc` directory. Sphinx is used to construct a searchable HTML\nversion of the API documents.\n\n```shell\ntox -e docs\n```\n\n## Support & Feedback\n\nTo post feedback, submit feature ideas, or report bugs, please use the [Issues](https://github.com/aws-solutions-library-samples/osml-model-runner/issues) section of this GitHub repo.\n\nIf you are interested in contributing to OversightML Model Runner, see the [CONTRIBUTING](CONTRIBUTING.md) guide.\n\n## Security\n\nSee [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.\n\n## License\n\nMIT No Attribution Licensed. See [LICENSE](LICENSE).\n",
    "bugtrack_url": null,
    "license": " MIT No Attribution  Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
    "summary": "Application to run large scale imagery against AI/ML models",
    "version": "2.3.0",
    "project_urls": {
        "Source": "https://github.com/aws-solutions-library-samples/osml-model-runner",
        "Tracker": "https://github.com/aws-solutions-library-samples/osml-model-runner/issues"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "29ee04b22cbdd71e74d83718ed07b831399ec2d2d29fe7f7e23a55ed3263ec63",
                "md5": "d094200f236facfd83de3a52f2df123a",
                "sha256": "ff59b206675a5ab8bfc07d8d53142b65d7a1ea2ed7abef8b9ff6c8f3d93cb636"
            },
            "downloads": -1,
            "filename": "osml_model_runner-2.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d094200f236facfd83de3a52f2df123a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 99674,
            "upload_time": "2024-11-26T20:00:27",
            "upload_time_iso_8601": "2024-11-26T20:00:27.991470Z",
            "url": "https://files.pythonhosted.org/packages/29/ee/04b22cbdd71e74d83718ed07b831399ec2d2d29fe7f7e23a55ed3263ec63/osml_model_runner-2.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f9053e25aad42fdd00b3963100a90c4abe81b85383afe91031219eae6b76c4a9",
                "md5": "dceb08861eebcfb5609bc91a9ea26d5a",
                "sha256": "4cd2869fcb25e72cca0a8a6c4c75ceb29fb3f864639fdbc4a80bf391bc658165"
            },
            "downloads": -1,
            "filename": "osml_model_runner-2.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "dceb08861eebcfb5609bc91a9ea26d5a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 76508,
            "upload_time": "2024-11-26T20:00:29",
            "upload_time_iso_8601": "2024-11-26T20:00:29.835621Z",
            "url": "https://files.pythonhosted.org/packages/f9/05/3e25aad42fdd00b3963100a90c4abe81b85383afe91031219eae6b76c4a9/osml_model_runner-2.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-26 20:00:29",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "aws-solutions-library-samples",
    "github_project": "osml-model-runner",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "tox": true,
    "lcname": "osml-model-runner"
}

Amazon Web Services