# Cog Worker
Scalable geospatial analysis on Cloud Optimized GeoTIFFs.
- **Documentation**: https://vizzuality.github.io/cog_worker
- **PyPI**: https://pypi.org/project/cog-worker
cog_worker is a simple library to help write scripts to conduct scaleable
analysis of gridded data. It's intended to be useful for moderate- to large-scale
GIS, remote sensing, and machine learning applications.
## Installation
```
pip install cog_worker
```
## Examples
See `docs/examples` for Jupyter notebook examples
## Quick start
0. A simple cog_worker script
```python
from rasterio.plot import show
from cog_worker import Manager
def my_analysis(worker):
arr = worker.read('roads_cog.tif')
return arr
manager = Manager(proj='wgs84', scale=0.083333)
arr, bbox = manager.preview(my_analysis)
show(arr)
```
1. Define an analysis function that recieves a cog_worker.Worker as the first parameter.
```python
from cog_worker import Worker, Manager
import numpy as np
# Define an analysis function to read and process COG data sources
def MyAnalysis(worker: Worker) -> np.ndarray:
# 1. Read a COG (reprojecting, resampling and clipping as necessary)
array: np.ndarray = worker.read('roads_cog.tif')
# 2. Work on the array
# ...
# 3. Return (or post to blob storage etc.)
return array
```
2. Run your analysis in different scales and projections
```python
import rasterio as rio
# Run your analysis using a cog_worker.Manager which handles chunking
manager = Manager(
proj = 'wgs84', # any pyproj string
scale = 0.083333, # in projection units (degrees or meters)
bounds = (-180, -90, 180, 90),
buffer = 128 # buffer pixels when chunking analysis
)
# preview analysis
arr, bbox = manager.preview(MyAnalysis, max_size=1024)
rio.plot.show(arr)
# preview analysis chunks
for bbox in manager.chunks(chunksize=1500):
print(bbox)
# execute analysis chunks sequentially
for arr, bbox in manager.chunk_execute(MyAnalysis, chunksize=1500):
rio.plot.show(arr)
# generate job execution parameters
for params in manager.chunk_params(chunksize=1500):
print(params)
```
3. Write scale-dependent functions¶
```python
import scipy
def focal_mean(
worker: Worker,
kernel_radius: float = 1000 # radius in projection units (meters)
) -> np.ndarray:
array: np.ndarray = worker.read('sample-geotiff.tif')
# Access the pixel size at worker.scale
kernel_size = kernel_radius * 2 / worker.scale
array = scipy.ndimage.uniform_filter(array, kernel_size)
return array
```
4. Chunk your analysis and run it in a dask cluster
```python
from cog_worker.distributed import DaskManager
from dask.distributed import LocalCluster, Client
# Set up a Manager with that connects to a Dask cluster
cluster = LocalCluster()
client = Client(cluster)
distributed_manager = DaskManager(
client,
proj = 'wgs84',
scale = 0.083333,
bounds = (-180, -90, 180, 90),
buffer = 128
)
# Execute in worker pool and save chunks to disk as they complete.
distributed_manager.chunk_save('output.tif', MyAnalysis, chunksize=2048)
```
Raw data
{
"_id": null,
"home_page": "https://github.com/vizzuality/cog_worker",
"name": "cog-worker",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "cog geotiff raster gdal rasterio dask",
"author": "Francis Gassert",
"author_email": "francis.gassert@vizzuality.com",
"download_url": "https://files.pythonhosted.org/packages/a3/0c/ce75f4f1d3f3435f64f52d1d75b34f911912bf52adbd0e7b1fe6394266be/cog_worker-0.2.0.tar.gz",
"platform": null,
"description": "# Cog Worker\n\nScalable geospatial analysis on Cloud Optimized GeoTIFFs.\n\n - **Documentation**: https://vizzuality.github.io/cog_worker\n - **PyPI**: https://pypi.org/project/cog-worker\n\ncog_worker is a simple library to help write scripts to conduct scaleable\nanalysis of gridded data. It's intended to be useful for moderate- to large-scale \nGIS, remote sensing, and machine learning applications.\n\n## Installation\n\n```\npip install cog_worker\n```\n\n## Examples\n\nSee `docs/examples` for Jupyter notebook examples\n\n## Quick start\n\n0. A simple cog_worker script\n\n```python\nfrom rasterio.plot import show\nfrom cog_worker import Manager\n\ndef my_analysis(worker):\n arr = worker.read('roads_cog.tif')\n return arr\n\nmanager = Manager(proj='wgs84', scale=0.083333)\narr, bbox = manager.preview(my_analysis)\nshow(arr)\n```\n\n1. Define an analysis function that recieves a cog_worker.Worker as the first parameter.\n\n```python\nfrom cog_worker import Worker, Manager\nimport numpy as np\n\n# Define an analysis function to read and process COG data sources\ndef MyAnalysis(worker: Worker) -> np.ndarray:\n\n # 1. Read a COG (reprojecting, resampling and clipping as necessary)\n array: np.ndarray = worker.read('roads_cog.tif')\n\n # 2. Work on the array\n # ...\n\n # 3. Return (or post to blob storage etc.)\n return array\n```\n\n2. Run your analysis in different scales and projections\n\n```python\nimport rasterio as rio\n\n# Run your analysis using a cog_worker.Manager which handles chunking\nmanager = Manager(\n proj = 'wgs84', # any pyproj string\n scale = 0.083333, # in projection units (degrees or meters)\n bounds = (-180, -90, 180, 90),\n buffer = 128 # buffer pixels when chunking analysis\n)\n\n# preview analysis\narr, bbox = manager.preview(MyAnalysis, max_size=1024)\nrio.plot.show(arr)\n\n# preview analysis chunks\nfor bbox in manager.chunks(chunksize=1500):\n print(bbox)\n\n# execute analysis chunks sequentially\nfor arr, bbox in manager.chunk_execute(MyAnalysis, chunksize=1500):\n rio.plot.show(arr)\n\n# generate job execution parameters\nfor params in manager.chunk_params(chunksize=1500):\n print(params)\n```\n\n3. Write scale-dependent functions\u00b6\n\n```python\nimport scipy\n\ndef focal_mean(\n worker: Worker,\n kernel_radius: float = 1000 # radius in projection units (meters)\n) -> np.ndarray:\n\n array: np.ndarray = worker.read('sample-geotiff.tif')\n\n # Access the pixel size at worker.scale\n kernel_size = kernel_radius * 2 / worker.scale\n array = scipy.ndimage.uniform_filter(array, kernel_size)\n\n return array\n```\n\n4. Chunk your analysis and run it in a dask cluster\n\n```python\nfrom cog_worker.distributed import DaskManager\nfrom dask.distributed import LocalCluster, Client\n\n# Set up a Manager with that connects to a Dask cluster\ncluster = LocalCluster()\nclient = Client(cluster)\ndistributed_manager = DaskManager(\n client,\n proj = 'wgs84',\n scale = 0.083333,\n bounds = (-180, -90, 180, 90),\n buffer = 128\n)\n\n# Execute in worker pool and save chunks to disk as they complete.\ndistributed_manager.chunk_save('output.tif', MyAnalysis, chunksize=2048)\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Scalable geospatial analysis on Cloud Optimized GeoTIFFs.",
"version": "0.2.0",
"project_urls": {
"Homepage": "https://github.com/vizzuality/cog_worker"
},
"split_keywords": [
"cog",
"geotiff",
"raster",
"gdal",
"rasterio",
"dask"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a30cce75f4f1d3f3435f64f52d1d75b34f911912bf52adbd0e7b1fe6394266be",
"md5": "12fdc8ba3561cbca24169f3f58d39147",
"sha256": "e55b8389d39f542951af5861560f294e322950c1be85f04e7d2849d84c88138a"
},
"downloads": -1,
"filename": "cog_worker-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "12fdc8ba3561cbca24169f3f58d39147",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 15387,
"upload_time": "2023-07-14T15:07:33",
"upload_time_iso_8601": "2023-07-14T15:07:33.871551Z",
"url": "https://files.pythonhosted.org/packages/a3/0c/ce75f4f1d3f3435f64f52d1d75b34f911912bf52adbd0e7b1fe6394266be/cog_worker-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-07-14 15:07:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "vizzuality",
"github_project": "cog_worker",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"tox": true,
"lcname": "cog-worker"
}