stactask


Namestactask JSON
Version 0.6.0 PyPI version JSON
download
home_pageNone
SummaryClass interface for running custom algorithms and workflows on STAC Items
upload_time2024-09-19 18:54:19
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseApache-2.0
keywords pystac imagery raster catalog stac
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <!-- omit from toc -->
# STAC Task (stac-task)

[![Build Status](https://github.com/stac-utils/stac-task/workflows/CI/badge.svg?branch=main)](https://github.com/stac-utils/stac-task/actions/workflows/continuous-integration.yml)
[![PyPI version](https://badge.fury.io/py/stac-task.svg)](https://badge.fury.io/py/stac-task)
[![Documentation Status](https://readthedocs.org/projects/stac-task/badge/?version=latest)](https://stac-task.readthedocs.io/en/latest/?badge=latest)
[![codecov](https://codecov.io/gh/stac-utils/stac-task/branch/main/graph/badge.svg)](https://codecov.io/gh/stac-utils/stac-task)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

- [Quickstart for Creating New Tasks](#quickstart-for-creating-new-tasks)
- [Task Input](#task-input)
  - [ProcessDefinition Object](#processdefinition-object)
    - [UploadOptions Object](#uploadoptions-object)
      - [path\_template](#path_template)
      - [collections](#collections)
    - [tasks](#tasks)
    - [TaskConfig Object](#taskconfig-object)
- [Full Process Definition Example](#full-process-definition-example)
- [Migration](#migration)
  - [0.4.x -\> 0.5.x](#04x---05x)
- [Development](#development)
- [Contributing](#contributing)

This Python library consists of the Task class, which is used to create custom tasks based
on a "STAC In, STAC Out" approach. The Task class acts as wrapper around custom code and provides
several convenience methods for modifying STAC Items, creating derived Items, and providing a CLI.

This library is based on a [branch of cirrus-lib](https://github.com/cirrus-geo/cirrus-lib/tree/features/task-class) except aims to be more generic.

## Quickstart for Creating New Tasks

```python
from typing import Any

from stactask import Task, DownloadConfig

class MyTask(Task):
    name = "my-task"
    description = "this task does it all"

    def validate(self, payload: dict[str, Any]) -> bool:
        return len(self.items) == 1

    def process(self, **kwargs: Any) -> list[dict[str, Any]]:
        item = self.items[0]

        # download a datafile
        item = self.download_item_assets(
            item,
            config=DownloadConfig(include=['data'])
        )

        # operate on the local file to create a new asset
        item = self.upload_item_assets_to_s3(item)

        # this task returns a single item
        return [item.to_dict(include_self_link=True, transform_hrefs=False)]
```

## Task Input

| Field Name | Type              | Description               |
| ---------- | ----------------- | ------------------------- |
| type       | string            | Must be FeatureCollection |
| features   | [Item]            | A list of STAC `Item`     |
| process    | ProcessDefinition | A Process Definition      |

### ProcessDefinition Object

A STAC task can be provided additional configuration via the 'process' field in the input
ItemCollection.

| Field Name     | Type          | Description                                                                                                                                                                    |
| -------------- | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| description    | string        | Optional description of the process configuration                                                                                                                              |
| upload_options | UploadOptions | Options used when uploading assets to a remote server                                                                                                                          |
| tasks          | Map<str, Map> | Dictionary of task configurations. A list of [task configurations](#taskconfig-object) is supported for backwards compatibility reasons, but a dictionary should be preferred. |

#### UploadOptions Object

| Field Name    | Type          | Description                                                                             |
| ------------- | ------------- | --------------------------------------------------------------------------------------- |
| path_template | string        | **REQUIRED** A string template for specifying the location of uploaded assets           |
| public_assets | [str]         | A list of asset keys that should be marked as public when uploaded                      |
| headers       | Map<str, str> | A set of key, value headers to send when uploading data to s3                           |
| collections   | Map<str, str> | A mapping of output collection name to a JSONPath pattern (for matching Items)          |
| s3_urls       | bool          | Controls if the final published URLs should be an s3 (s3://*bucket*/*key*) or https URL |

##### path_template

The path_template string is a way to control the output location of uploaded assets from a STAC Item using metadata from the Item itself.
The template can contain fixed strings along with variables used for substitution.
See [the PySTAC documentation for `LayoutTemplate`](https://pystac.readthedocs.io/en/stable/api/layout.html#pystac.layout.LayoutTemplate) for a list of supported template variables and their meaning.

##### collections

The collections dictionary provides a collection ID and JSONPath pattern for matching against STAC Items.
At the end of processing, before the final STAC Items are returned, the Task class can be used to assign
all of the Items to specific collection IDs. For each Item the JSONPath pattern for all collections will be
compared. The first match will cause the Item's Collection ID to be set to the provided value.

For example:

```json
"collections": {
    "landsat-c2l2": "$[?(@.id =~ 'LC08.*')]"
}
```

In this example, the task will set any STAC Items that have an ID beginning with "LC08" to the `landsat-c2l2` collection.

See [JSONPath Online Evaluator](https://jsonpath.com) to experiment with JSONPath and [regex101](https://regex101.com) to experiment with regex.

#### tasks

The tasks field is a dictionary with an optional key for each task. If present, it contains
a dictionary that is converted to a set of keywords and passed to the Task's `process` function.
The documentation for each task will provide the list of available parameters.

```json
{
    "tasks": {
        "task-a": {
            "param1": "value1"
        },
        "task-c": {
            "param2": "value2"
        }
    }
}
```

In the example above a task named `task-a` would have the `param1=value1` passed as a keyword, while `task-c`
would have `param2=value2` passed. If there were a `task-b` to be run it would not be passed any keywords.

#### TaskConfig Object

**DEPRECATED**: `tasks` should be a dictionary of parameters, with task names as keys. See [tasks](#tasks) for more information.

A Task Configuration contains information for running a specific task.

| Field Name | Type          | Description                                                                          |
| ---------- | ------------- | ------------------------------------------------------------------------------------ |
| name       | str           | **REQUIRED** Name of the task                                                        |
| parameters | Map<str, str> | Dictionary of keyword parameters that will be passed to the Tasks `process` function |

## Full Process Definition Example

Process definitions are sometimes called "Payloads":

```json
{
    "description": "My process configuration",
    "collections": {
        "landsat-c2l2": "$[?(@.id =~ 'LC08.*')]"
    },
    "upload_options": {
        "path_template": "s3://my-bucket/${collection}/${year}/${month}/${day}/${id}"
    },
    "tasks": {
        "task-name": {
            "param": "value"
        }
    }
}
```

## Migration

### 0.4.x -> 0.5.x

In 0.5.0, the previous use of fsspec to download Item Assets has been replaced with
the stac-asset library. This has necessitated a change in the parameters
that the download methods accept.

The primary change is that the Task methods `download_item_assets` and
`download_items_assets` (items plural) now accept fewer explicit and implicit
(kwargs) parameters.

Previously, the methods looked like:

```python
  def download_item_assets(
        self,
        item: Item,
        path_template: str = "${collection}/${id}",
        keep_original_filenames: bool = False,
        **kwargs: Any,
    ) -> Item:
```

but now look like:

```python
    def download_item_assets(
        self,
        item: Item,
        path_template: str = "${collection}/${id}",
        config: Optional[DownloadConfig] = None,
    ) -> Item:
```

Similarly, the `asset_io` package methods were previously:

```python
async def download_item_assets(
    item: Item,
    assets: Optional[list[str]] = None,
    save_item: bool = True,
    overwrite: bool = False,
    path_template: str = "${collection}/${id}",
    absolute_path: bool = False,
    keep_original_filenames: bool = False,
    **kwargs: Any,
) -> Item:
```

and are now:

```python
async def download_item_assets(
    item: Item,
    path_template: str = "${collection}/${id}",
    config: Optional[DownloadConfig] = None,
) -> Item:
```

Additionally, `kwargs` keys were set to pass configuration through to fsspec. The most common
parameter was `requester_pays`, to set the Requester Pays flag in AWS S3 requests.

Many of these parameters can be directly translated into configuration passed in a
`DownloadConfig` object, which is just a wrapper over the `stac_asset.Config` object.

Migration of these various parameters to `DownloadConfig` are as follows:

- `assets`: set `include`
- `requester_pays`: set `s3_requester_pays` = True
- `keep_original_filenames`: set `file_name_strategy` to
  `FileNameStrategy.FILE_NAME` if True or `FileNameStrategy.KEY` if False
- `overwrite`: set `overwrite`
- `save_item`: none, Item is always saved
- `absolute_path`: none. To create or retrieve the Asset hrefs as absolute paths, use either
  `Item#make_all_asset_hrefs_absolute()` or `Asset#get_absolute_href()`

### 0.5.x -> 0.6.0

Previously, the `validate` method was a _classmethod_, validating the payload
argument passed.  This has now been made an instance method, which validates
the `self._payload` copy of the payload, from which the `Task` instance is
constructed.  This is behaviorally the same, in that construction will fail if
validation fails, but allows implementers to utilize the instance method's
convenience functions.

Previous implementations of `validate` would have been similar to this:

```python
    @classmethod
    def validate(payload: dict[str, Any]) -> bool:
        # Check The Things™
        return isinstance(payload, dict)
```

And will now need to be updated to this form:

```python
    def validate(self) -> bool:
        # Check The Things™
        return isinstance(self._payload, dict)
```

## Development

Clone, install in editable mode with development requirements, and install the **pre-commit** hooks:

```shell
git clone https://github.com/stac-utils/stac-task
cd stac-task
pip install -e '.[dev]'
pre-commit install
```

To run the tests:

```shell
pytest
```

To lint all the files:

```shell
pre-commit run --all-files
```

## Contributing

Use Github [issues](https://github.com/stac-utils/stac-task/issues) and [pull requests](https://github.com/stac-utils/stac-task/pulls).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "stactask",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "Ian Cooke <ircwaves@gmail.com>",
    "keywords": "pystac, imagery, raster, catalog, STAC",
    "author": null,
    "author_email": "Matthew Hanson <matt.a.hanson@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/32/06/7bd275dc3384549aececbb0641f6cdede5a710f7cb5ab93fe33c0412babb/stactask-0.6.0.tar.gz",
    "platform": null,
    "description": "<!-- omit from toc -->\n# STAC Task (stac-task)\n\n[![Build Status](https://github.com/stac-utils/stac-task/workflows/CI/badge.svg?branch=main)](https://github.com/stac-utils/stac-task/actions/workflows/continuous-integration.yml)\n[![PyPI version](https://badge.fury.io/py/stac-task.svg)](https://badge.fury.io/py/stac-task)\n[![Documentation Status](https://readthedocs.org/projects/stac-task/badge/?version=latest)](https://stac-task.readthedocs.io/en/latest/?badge=latest)\n[![codecov](https://codecov.io/gh/stac-utils/stac-task/branch/main/graph/badge.svg)](https://codecov.io/gh/stac-utils/stac-task)\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n\n- [Quickstart for Creating New Tasks](#quickstart-for-creating-new-tasks)\n- [Task Input](#task-input)\n  - [ProcessDefinition Object](#processdefinition-object)\n    - [UploadOptions Object](#uploadoptions-object)\n      - [path\\_template](#path_template)\n      - [collections](#collections)\n    - [tasks](#tasks)\n    - [TaskConfig Object](#taskconfig-object)\n- [Full Process Definition Example](#full-process-definition-example)\n- [Migration](#migration)\n  - [0.4.x -\\> 0.5.x](#04x---05x)\n- [Development](#development)\n- [Contributing](#contributing)\n\nThis Python library consists of the Task class, which is used to create custom tasks based\non a \"STAC In, STAC Out\" approach. The Task class acts as wrapper around custom code and provides\nseveral convenience methods for modifying STAC Items, creating derived Items, and providing a CLI.\n\nThis library is based on a [branch of cirrus-lib](https://github.com/cirrus-geo/cirrus-lib/tree/features/task-class) except aims to be more generic.\n\n## Quickstart for Creating New Tasks\n\n```python\nfrom typing import Any\n\nfrom stactask import Task, DownloadConfig\n\nclass MyTask(Task):\n    name = \"my-task\"\n    description = \"this task does it all\"\n\n    def validate(self, payload: dict[str, Any]) -> bool:\n        return len(self.items) == 1\n\n    def process(self, **kwargs: Any) -> list[dict[str, Any]]:\n        item = self.items[0]\n\n        # download a datafile\n        item = self.download_item_assets(\n            item,\n            config=DownloadConfig(include=['data'])\n        )\n\n        # operate on the local file to create a new asset\n        item = self.upload_item_assets_to_s3(item)\n\n        # this task returns a single item\n        return [item.to_dict(include_self_link=True, transform_hrefs=False)]\n```\n\n## Task Input\n\n| Field Name | Type              | Description               |\n| ---------- | ----------------- | ------------------------- |\n| type       | string            | Must be FeatureCollection |\n| features   | [Item]            | A list of STAC `Item`     |\n| process    | ProcessDefinition | A Process Definition      |\n\n### ProcessDefinition Object\n\nA STAC task can be provided additional configuration via the 'process' field in the input\nItemCollection.\n\n| Field Name     | Type          | Description                                                                                                                                                                    |\n| -------------- | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |\n| description    | string        | Optional description of the process configuration                                                                                                                              |\n| upload_options | UploadOptions | Options used when uploading assets to a remote server                                                                                                                          |\n| tasks          | Map<str, Map> | Dictionary of task configurations. A list of [task configurations](#taskconfig-object) is supported for backwards compatibility reasons, but a dictionary should be preferred. |\n\n#### UploadOptions Object\n\n| Field Name    | Type          | Description                                                                             |\n| ------------- | ------------- | --------------------------------------------------------------------------------------- |\n| path_template | string        | **REQUIRED** A string template for specifying the location of uploaded assets           |\n| public_assets | [str]         | A list of asset keys that should be marked as public when uploaded                      |\n| headers       | Map<str, str> | A set of key, value headers to send when uploading data to s3                           |\n| collections   | Map<str, str> | A mapping of output collection name to a JSONPath pattern (for matching Items)          |\n| s3_urls       | bool          | Controls if the final published URLs should be an s3 (s3://*bucket*/*key*) or https URL |\n\n##### path_template\n\nThe path_template string is a way to control the output location of uploaded assets from a STAC Item using metadata from the Item itself.\nThe template can contain fixed strings along with variables used for substitution.\nSee [the PySTAC documentation for `LayoutTemplate`](https://pystac.readthedocs.io/en/stable/api/layout.html#pystac.layout.LayoutTemplate) for a list of supported template variables and their meaning.\n\n##### collections\n\nThe collections dictionary provides a collection ID and JSONPath pattern for matching against STAC Items.\nAt the end of processing, before the final STAC Items are returned, the Task class can be used to assign\nall of the Items to specific collection IDs. For each Item the JSONPath pattern for all collections will be\ncompared. The first match will cause the Item's Collection ID to be set to the provided value.\n\nFor example:\n\n```json\n\"collections\": {\n    \"landsat-c2l2\": \"$[?(@.id =~ 'LC08.*')]\"\n}\n```\n\nIn this example, the task will set any STAC Items that have an ID beginning with \"LC08\" to the `landsat-c2l2` collection.\n\nSee [JSONPath Online Evaluator](https://jsonpath.com) to experiment with JSONPath and [regex101](https://regex101.com) to experiment with regex.\n\n#### tasks\n\nThe tasks field is a dictionary with an optional key for each task. If present, it contains\na dictionary that is converted to a set of keywords and passed to the Task's `process` function.\nThe documentation for each task will provide the list of available parameters.\n\n```json\n{\n    \"tasks\": {\n        \"task-a\": {\n            \"param1\": \"value1\"\n        },\n        \"task-c\": {\n            \"param2\": \"value2\"\n        }\n    }\n}\n```\n\nIn the example above a task named `task-a` would have the `param1=value1` passed as a keyword, while `task-c`\nwould have `param2=value2` passed. If there were a `task-b` to be run it would not be passed any keywords.\n\n#### TaskConfig Object\n\n**DEPRECATED**: `tasks` should be a dictionary of parameters, with task names as keys. See [tasks](#tasks) for more information.\n\nA Task Configuration contains information for running a specific task.\n\n| Field Name | Type          | Description                                                                          |\n| ---------- | ------------- | ------------------------------------------------------------------------------------ |\n| name       | str           | **REQUIRED** Name of the task                                                        |\n| parameters | Map<str, str> | Dictionary of keyword parameters that will be passed to the Tasks `process` function |\n\n## Full Process Definition Example\n\nProcess definitions are sometimes called \"Payloads\":\n\n```json\n{\n    \"description\": \"My process configuration\",\n    \"collections\": {\n        \"landsat-c2l2\": \"$[?(@.id =~ 'LC08.*')]\"\n    },\n    \"upload_options\": {\n        \"path_template\": \"s3://my-bucket/${collection}/${year}/${month}/${day}/${id}\"\n    },\n    \"tasks\": {\n        \"task-name\": {\n            \"param\": \"value\"\n        }\n    }\n}\n```\n\n## Migration\n\n### 0.4.x -> 0.5.x\n\nIn 0.5.0, the previous use of fsspec to download Item Assets has been replaced with\nthe stac-asset library. This has necessitated a change in the parameters\nthat the download methods accept.\n\nThe primary change is that the Task methods `download_item_assets` and\n`download_items_assets` (items plural) now accept fewer explicit and implicit\n(kwargs) parameters.\n\nPreviously, the methods looked like:\n\n```python\n  def download_item_assets(\n        self,\n        item: Item,\n        path_template: str = \"${collection}/${id}\",\n        keep_original_filenames: bool = False,\n        **kwargs: Any,\n    ) -> Item:\n```\n\nbut now look like:\n\n```python\n    def download_item_assets(\n        self,\n        item: Item,\n        path_template: str = \"${collection}/${id}\",\n        config: Optional[DownloadConfig] = None,\n    ) -> Item:\n```\n\nSimilarly, the `asset_io` package methods were previously:\n\n```python\nasync def download_item_assets(\n    item: Item,\n    assets: Optional[list[str]] = None,\n    save_item: bool = True,\n    overwrite: bool = False,\n    path_template: str = \"${collection}/${id}\",\n    absolute_path: bool = False,\n    keep_original_filenames: bool = False,\n    **kwargs: Any,\n) -> Item:\n```\n\nand are now:\n\n```python\nasync def download_item_assets(\n    item: Item,\n    path_template: str = \"${collection}/${id}\",\n    config: Optional[DownloadConfig] = None,\n) -> Item:\n```\n\nAdditionally, `kwargs` keys were set to pass configuration through to fsspec. The most common\nparameter was `requester_pays`, to set the Requester Pays flag in AWS S3 requests.\n\nMany of these parameters can be directly translated into configuration passed in a\n`DownloadConfig` object, which is just a wrapper over the `stac_asset.Config` object.\n\nMigration of these various parameters to `DownloadConfig` are as follows:\n\n- `assets`: set `include`\n- `requester_pays`: set `s3_requester_pays` = True\n- `keep_original_filenames`: set `file_name_strategy` to\n  `FileNameStrategy.FILE_NAME` if True or `FileNameStrategy.KEY` if False\n- `overwrite`: set `overwrite`\n- `save_item`: none, Item is always saved\n- `absolute_path`: none. To create or retrieve the Asset hrefs as absolute paths, use either\n  `Item#make_all_asset_hrefs_absolute()` or `Asset#get_absolute_href()`\n\n### 0.5.x -> 0.6.0\n\nPreviously, the `validate` method was a _classmethod_, validating the payload\nargument passed.  This has now been made an instance method, which validates\nthe `self._payload` copy of the payload, from which the `Task` instance is\nconstructed.  This is behaviorally the same, in that construction will fail if\nvalidation fails, but allows implementers to utilize the instance method's\nconvenience functions.\n\nPrevious implementations of `validate` would have been similar to this:\n\n```python\n    @classmethod\n    def validate(payload: dict[str, Any]) -> bool:\n        # Check The Things\u2122\n        return isinstance(payload, dict)\n```\n\nAnd will now need to be updated to this form:\n\n```python\n    def validate(self) -> bool:\n        # Check The Things\u2122\n        return isinstance(self._payload, dict)\n```\n\n## Development\n\nClone, install in editable mode with development requirements, and install the **pre-commit** hooks:\n\n```shell\ngit clone https://github.com/stac-utils/stac-task\ncd stac-task\npip install -e '.[dev]'\npre-commit install\n```\n\nTo run the tests:\n\n```shell\npytest\n```\n\nTo lint all the files:\n\n```shell\npre-commit run --all-files\n```\n\n## Contributing\n\nUse Github [issues](https://github.com/stac-utils/stac-task/issues) and [pull requests](https://github.com/stac-utils/stac-task/pulls).\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Class interface for running custom algorithms and workflows on STAC Items",
    "version": "0.6.0",
    "project_urls": {
        "Changelog": "https://github.com/stac-utils/stac-task/blob/main/CHANGELOG.md",
        "Github": "https://github.com/stac-utils/stac-task",
        "Issues": "https://github.com/stac-utils/stactask/issues"
    },
    "split_keywords": [
        "pystac",
        " imagery",
        " raster",
        " catalog",
        " stac"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b827b766ca5255dade48434305415e0b05ab5d793f4b9a24117b2799c945ca2a",
                "md5": "62de8e43f2fda2b214c0a2c1e9391148",
                "sha256": "1462f2bd4cda8afd80c9d40cb70f0d482920016fca104aef42109028ceaa42c2"
            },
            "downloads": -1,
            "filename": "stactask-0.6.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "62de8e43f2fda2b214c0a2c1e9391148",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 18898,
            "upload_time": "2024-09-19T18:54:18",
            "upload_time_iso_8601": "2024-09-19T18:54:18.694594Z",
            "url": "https://files.pythonhosted.org/packages/b8/27/b766ca5255dade48434305415e0b05ab5d793f4b9a24117b2799c945ca2a/stactask-0.6.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "32067bd275dc3384549aececbb0641f6cdede5a710f7cb5ab93fe33c0412babb",
                "md5": "ed8e6977e691e06ed2c79615c4af739e",
                "sha256": "2c1a3c18b03cd8c3a74efade11fedab6c16b922643ab462922ee34371d46a58b"
            },
            "downloads": -1,
            "filename": "stactask-0.6.0.tar.gz",
            "has_sig": false,
            "md5_digest": "ed8e6977e691e06ed2c79615c4af739e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 24133,
            "upload_time": "2024-09-19T18:54:19",
            "upload_time_iso_8601": "2024-09-19T18:54:19.793621Z",
            "url": "https://files.pythonhosted.org/packages/32/06/7bd275dc3384549aececbb0641f6cdede5a710f7cb5ab93fe33c0412babb/stactask-0.6.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-19 18:54:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "stac-utils",
    "github_project": "stac-task",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "stactask"
}
        
Elapsed time: 3.75941s