cog-worker


Namecog-worker JSON
Version 0.2.0 PyPI version JSON
download
home_pagehttps://github.com/vizzuality/cog_worker
SummaryScalable geospatial analysis on Cloud Optimized GeoTIFFs.
upload_time2023-07-14 15:07:33
maintainer
docs_urlNone
authorFrancis Gassert
requires_python
licenseMIT
keywords cog geotiff raster gdal rasterio dask
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Cog Worker

Scalable geospatial analysis on Cloud Optimized GeoTIFFs.

 - **Documentation**: https://vizzuality.github.io/cog_worker
 - **PyPI**: https://pypi.org/project/cog-worker

cog_worker is a simple library to help write scripts to conduct scaleable
analysis of gridded data. It's intended to be useful for moderate- to large-scale 
GIS, remote sensing, and machine learning applications.

## Installation

```
pip install cog_worker
```

## Examples

See `docs/examples` for Jupyter notebook examples

## Quick start

0. A simple cog_worker script

```python
from rasterio.plot import show
from cog_worker import Manager

def my_analysis(worker):
    arr = worker.read('roads_cog.tif')
    return arr

manager = Manager(proj='wgs84', scale=0.083333)
arr, bbox = manager.preview(my_analysis)
show(arr)
```

1. Define an analysis function that recieves a cog_worker.Worker as the first parameter.

```python
from cog_worker import Worker, Manager
import numpy as np

# Define an analysis function to read and process COG data sources
def MyAnalysis(worker: Worker) -> np.ndarray:

    # 1. Read a COG (reprojecting, resampling and clipping as necessary)
    array: np.ndarray = worker.read('roads_cog.tif')

    # 2. Work on the array
    # ...

    # 3. Return (or post to blob storage etc.)
    return array
```

2. Run your analysis in different scales and projections

```python
import rasterio as rio

# Run your analysis using a cog_worker.Manager which handles chunking
manager = Manager(
    proj = 'wgs84',       # any pyproj string
    scale = 0.083333,  # in projection units (degrees or meters)
    bounds = (-180, -90, 180, 90),
    buffer = 128          # buffer pixels when chunking analysis
)

# preview analysis
arr, bbox = manager.preview(MyAnalysis, max_size=1024)
rio.plot.show(arr)

# preview analysis chunks
for bbox in manager.chunks(chunksize=1500):
    print(bbox)

# execute analysis chunks sequentially
for arr, bbox in manager.chunk_execute(MyAnalysis, chunksize=1500):
    rio.plot.show(arr)

# generate job execution parameters
for params in manager.chunk_params(chunksize=1500):
    print(params)
```

3. Write scale-dependent functions¶

```python
import scipy

def focal_mean(
    worker: Worker,
    kernel_radius: float = 1000 # radius in projection units (meters)
) -> np.ndarray:

    array: np.ndarray = worker.read('sample-geotiff.tif')

    # Access the pixel size at worker.scale
    kernel_size = kernel_radius * 2 / worker.scale
    array = scipy.ndimage.uniform_filter(array, kernel_size)

    return array
```

4. Chunk your analysis and run it in a dask cluster

```python
from cog_worker.distributed import DaskManager
from dask.distributed import LocalCluster, Client

# Set up a Manager with that connects to a Dask cluster
cluster = LocalCluster()
client = Client(cluster)
distributed_manager = DaskManager(
    client,
    proj = 'wgs84',
    scale = 0.083333,
    bounds = (-180, -90, 180, 90),
    buffer = 128
)

# Execute in worker pool and save chunks to disk as they complete.
distributed_manager.chunk_save('output.tif', MyAnalysis, chunksize=2048)
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/vizzuality/cog_worker",
    "name": "cog-worker",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "cog geotiff raster gdal rasterio dask",
    "author": "Francis Gassert",
    "author_email": "francis.gassert@vizzuality.com",
    "download_url": "https://files.pythonhosted.org/packages/a3/0c/ce75f4f1d3f3435f64f52d1d75b34f911912bf52adbd0e7b1fe6394266be/cog_worker-0.2.0.tar.gz",
    "platform": null,
    "description": "# Cog Worker\n\nScalable geospatial analysis on Cloud Optimized GeoTIFFs.\n\n - **Documentation**: https://vizzuality.github.io/cog_worker\n - **PyPI**: https://pypi.org/project/cog-worker\n\ncog_worker is a simple library to help write scripts to conduct scaleable\nanalysis of gridded data. It's intended to be useful for moderate- to large-scale \nGIS, remote sensing, and machine learning applications.\n\n## Installation\n\n```\npip install cog_worker\n```\n\n## Examples\n\nSee `docs/examples` for Jupyter notebook examples\n\n## Quick start\n\n0. A simple cog_worker script\n\n```python\nfrom rasterio.plot import show\nfrom cog_worker import Manager\n\ndef my_analysis(worker):\n    arr = worker.read('roads_cog.tif')\n    return arr\n\nmanager = Manager(proj='wgs84', scale=0.083333)\narr, bbox = manager.preview(my_analysis)\nshow(arr)\n```\n\n1. Define an analysis function that recieves a cog_worker.Worker as the first parameter.\n\n```python\nfrom cog_worker import Worker, Manager\nimport numpy as np\n\n# Define an analysis function to read and process COG data sources\ndef MyAnalysis(worker: Worker) -> np.ndarray:\n\n    # 1. Read a COG (reprojecting, resampling and clipping as necessary)\n    array: np.ndarray = worker.read('roads_cog.tif')\n\n    # 2. Work on the array\n    # ...\n\n    # 3. Return (or post to blob storage etc.)\n    return array\n```\n\n2. Run your analysis in different scales and projections\n\n```python\nimport rasterio as rio\n\n# Run your analysis using a cog_worker.Manager which handles chunking\nmanager = Manager(\n    proj = 'wgs84',       # any pyproj string\n    scale = 0.083333,  # in projection units (degrees or meters)\n    bounds = (-180, -90, 180, 90),\n    buffer = 128          # buffer pixels when chunking analysis\n)\n\n# preview analysis\narr, bbox = manager.preview(MyAnalysis, max_size=1024)\nrio.plot.show(arr)\n\n# preview analysis chunks\nfor bbox in manager.chunks(chunksize=1500):\n    print(bbox)\n\n# execute analysis chunks sequentially\nfor arr, bbox in manager.chunk_execute(MyAnalysis, chunksize=1500):\n    rio.plot.show(arr)\n\n# generate job execution parameters\nfor params in manager.chunk_params(chunksize=1500):\n    print(params)\n```\n\n3. Write scale-dependent functions\u00b6\n\n```python\nimport scipy\n\ndef focal_mean(\n    worker: Worker,\n    kernel_radius: float = 1000 # radius in projection units (meters)\n) -> np.ndarray:\n\n    array: np.ndarray = worker.read('sample-geotiff.tif')\n\n    # Access the pixel size at worker.scale\n    kernel_size = kernel_radius * 2 / worker.scale\n    array = scipy.ndimage.uniform_filter(array, kernel_size)\n\n    return array\n```\n\n4. Chunk your analysis and run it in a dask cluster\n\n```python\nfrom cog_worker.distributed import DaskManager\nfrom dask.distributed import LocalCluster, Client\n\n# Set up a Manager with that connects to a Dask cluster\ncluster = LocalCluster()\nclient = Client(cluster)\ndistributed_manager = DaskManager(\n    client,\n    proj = 'wgs84',\n    scale = 0.083333,\n    bounds = (-180, -90, 180, 90),\n    buffer = 128\n)\n\n# Execute in worker pool and save chunks to disk as they complete.\ndistributed_manager.chunk_save('output.tif', MyAnalysis, chunksize=2048)\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Scalable geospatial analysis on Cloud Optimized GeoTIFFs.",
    "version": "0.2.0",
    "project_urls": {
        "Homepage": "https://github.com/vizzuality/cog_worker"
    },
    "split_keywords": [
        "cog",
        "geotiff",
        "raster",
        "gdal",
        "rasterio",
        "dask"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a30cce75f4f1d3f3435f64f52d1d75b34f911912bf52adbd0e7b1fe6394266be",
                "md5": "12fdc8ba3561cbca24169f3f58d39147",
                "sha256": "e55b8389d39f542951af5861560f294e322950c1be85f04e7d2849d84c88138a"
            },
            "downloads": -1,
            "filename": "cog_worker-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "12fdc8ba3561cbca24169f3f58d39147",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 15387,
            "upload_time": "2023-07-14T15:07:33",
            "upload_time_iso_8601": "2023-07-14T15:07:33.871551Z",
            "url": "https://files.pythonhosted.org/packages/a3/0c/ce75f4f1d3f3435f64f52d1d75b34f911912bf52adbd0e7b1fe6394266be/cog_worker-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-14 15:07:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "vizzuality",
    "github_project": "cog_worker",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "cog-worker"
}
        
Elapsed time: 0.09268s