geobeam


Namegeobeam JSON
Version 1.1.2 PyPI version JSON
download
home_page
Summarygeobeam adds GIS capabilities to your Apache Beam pipelines
upload_time2023-05-16 17:11:39
maintainer
docs_urlNone
authorTravis Webb
requires_python>=3.8
license
keywords beam dataflow gdal gis
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            geobeam adds GIS capabilities to your Apache Beam pipelines.

## What does geobeam do?

`geobeam` enables you to ingest and analyze massive amounts of geospatial data in parallel using [Dataflow](https://cloud.google.com/dataflow).
geobeam provides a set of [FileBasedSource](https://beam.apache.org/releases/pydoc/2.41.0/apache_beam.io.filebasedsource.html)
classes that make it easy to read, process, and write geospatial data, and provides a set of helpful
Apache Beam transforms and utilities that make it easier to process GIS data in your Dataflow pipelines.

See the [Full Documentation](https://storage.googleapis.com/geobeam/docs/all.pdf) for complete API specification.

### Requirements
- Apache Beam 2.46+
- Python 3.8+

> Note: Make sure the Python version used to run the pipeline matches the version in the built container.

### Supported input types

| **File format** | **Data type** | **Geobeam class**  |
|:----------------|:--------------|:-------------------|
| `tiff`         | raster        | `RasterBlockSource` and `RasterPolygonSource`
| `shp`          | vector        | `ShapefileSource`
| `gdb`          | vector        | `GeodatabaseSource`
| `json`         | vector        | `GeoJSONSource`
| URL            | vector        | `ESRIServerSource`

### Included libraries

`geobeam` includes several python modules that allow you to perform a wide variety of operations and analyses on your geospatial data.

| **Module**      | **Version** | **Description** |
|:----------------|:------------|:----------------|
| [gdal](https://pypi.org/project/GDAL/)          | 3.5.2       | python bindings for GDAL
| [rasterio](https://pypi.org/project/rasterio/)  | 1.3.2       | reads and writes geospatial raster data
| [fiona](https://pypi.org/project/Fiona/)        | 1.8.21      | reads and writes geospatial vector data
| [shapely](https://pypi.org/project/Shapely/)    | 1.8.4       | manipulation and analysis of geometric objects in the cartesian plane
| [esridump](https://pypi.org/project/esridump/)  | 1.11.0      | read layer from ESRI server


## How to Use

### 1. Install the module
```
pip install geobeam
```

### 2. Write your pipeline

Write a normal Apache Beam pipeline using one of `geobeam`s file sources.
See [`geobeam/examples`](https://github.com/GoogleCloudPlatform/dataflow-geobeam/tree/main/geobeam/examples) for inspiration.

### 3. Run

#### Run locally

```
python -m geobeam.examples.geotiff_dem \
  --gcs_url gs://geobeam/examples/dem-clipped-test.tif \
  --dataset examples \
  --table dem \
  --band_column elev \
  --runner DirectRunner \
  --temp_location <temp gs://> \
  --project <project_id>
```

> Note: Some of the provided examples may take a very long time to run locally...

#### Run in Dataflow

##### Write a Dockerfile

This will run in Dataflow as a [custom container](https://cloud.google.com/dataflow/docs/guides/using-custom-containers) based on the [`dataflow-geobeam/base`](Dockerfile) image.
It is recommended that you publish your own container based on the Dockerfile in this repository and store it in your project's GCR registry.


```dockerfile
FROM gcr.io/dataflow-geobeam/base

RUN pip install geobeam

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .
```

```bash
# build locally with docker
docker build -t gcr.io/<project_id>/geobeam
docker push gcr.io/<project_id>/geobeam

# or build with Cloud Build
gcloud builds submit --timeout 3600s --worker_machine_type n1-highcpu-8
```

#### Start the Dataflow job

```
# run the geotiff_soilgrid example in dataflow
python -m geobeam.examples.geotiff_soilgrid \
  --gcs_url gs://geobeam/examples/AWCh3_M_sl1_250m_ll.tif \
  --dataset examples \
  --table soilgrid \
  --band_column h3 \
  --runner DataflowRunner \
  --sdk_container_image gcr.io/dataflow-geobeam/base \
  --temp_location <temp bucket> \
  --service_account_email <service account> \
  --region us-central1 \
  --max_num_workers 2 \
  --worker_machine_type c2-standard-30 \
```


## Examples

#### Read Raster as Blocks
```py
def run(options):
  from geobeam.io import RasterBlockSource
  from geobeam.fn import format_rasterblock_record

  with beam.Pipeline(options) as p:
    (p  | 'ReadRaster' >> beam.io.Read(RasterBlockSource(gcs_url))
        | 'FormatRecord' >> beam.Map(format_rasterblock_record)
        | 'WriteToBigquery' >> beam.io.WriteToBigQuery('geo.dem'))
```

#### Validate and Simplify Shapefile

```py
def run(options):
  from geobeam.io import ShapefileSource
  from geobeam.fn import make_valid, filter_invalid, format_record

  with beam.Pipeline(options) as p:
    (p  | 'ReadShapefile' >> beam.io.Read(ShapefileSource(gcs_url))
        | 'Validate' >> beam.Map(make_valid)
        | 'FilterInvalid' >> beam.Filter(filter_invalid)
        | 'FormatRecord' >> beam.Map(format_record)
        | 'WriteToBigquery' >> beam.io.WriteToBigQuery('geo.parcel'))
```

See `geobeam/examples/` for complete examples.

A number of example pipelines are available in the `geobeam/examples/` folder.
To run them in your Google Cloud project, run the included [terraform](https://www.terraform.io) file to set up the Bigquery dataset and tables used by the example pipelines.

Open up Bigquery GeoViz to visualize your data.

### Shapefile Example

The National Flood Hazard Layer loaded from a shapefile. Example pipeline at [`geobeam/examples/shapefile_nfhl.py`](https://github.com/GoogleCloudPlatform/dataflow-geobeam/blob/main/geobeam/examples/shapefile_nfhl.py)

![](https://storage.googleapis.com/geobeam/examples/geobeam-nfhl-geoviz-example.png)

### Raster Example

The Digital Elevation Model is a high-resolution model of elevation measurements at 1-meter resolution. (Values converted to centimeters). Example pipeline: [`geobeam/examples/geotiff_dem.py`](https://github.com/GoogleCloudPlatform/dataflow-geobeam/blob/main/geobeam/examples/geotiff_dem.py).

![](https://storage.googleapis.com/geobeam/examples/geobeam-dem-example-geoviz.png)

## Included Transforms

The `geobeam.fn` module includes several [Beam Transforms](https://beam.apache.org/documentation/programming-guide/#transforms) that you can use in your pipelines.

| **Module**      | **Description**
|:----------------|:------------|
| `geobeam.fn.make_valid`     | Attempt to make all geometries valid. 
| `geobeam.fn.filter_invalid` | Filter out invalid geometries that cannot be made valid
| `geobeam.fn.format_record`  | Format the (props, geom) tuple received from a vector source into a `dict` that can be inserted into the destination table
| `geobeam.fn.format_rasterblock_record` | Format the output record for blocks read from `RasterBlockSource`
| `geobeam.fn.format_rasterpolygon_record` | Format the output record for blocks read from `RasterPolygonSource`


## Execution parameters

Each FileSource accepts several parameters that you can use to configure how your data is loaded and processed.
These can be parsed as pipeline arguments and passed into the respective FileSources as seen in the examples pipelines.

| **Parameter**      | **Input type** | **Description** | **Default** | **Required?**
|:-------------------|:---------------|:----------------|:------------|---------------|
| `skip_reproject`   | All     | True to skip reprojection during read | `False` | No
| `in_epsg`          | All     | An [EPSG integer](https://en.wikipedia.org/wiki/EPSG_Geodetic_Parameter_Dataset) to override the input source CRS to reproject from | | No
| `in_proj`          | All     | A [PROJ string](https://proj.org/usage/quickstart.html) to override the input source CRS | | No
| `band_number`      | Raster  | The raster band to read from | `1` | No
| `include_nodata`   | Raster  | True to include `nodata` values | `False` | No
| `return_block_transform` | Raster | True to include rasterio `transform` object with each block to use with `geobeam.fn.format_rasterpixel_record` | `False` | No
| `layer_name`       | Vector  | Name of layer to read | | Yes, for shapefiles
| `gdb_name`         | Vector  | Name of geodatabase directory in a gdb zip archive | | Yes, for GDB files


## License

This is not an officially supported Google product, though support will be provided on a best-effort basis.

```
Copyright 2023 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
```



            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "geobeam",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "beam dataflow gdal gis",
    "author": "Travis Webb",
    "author_email": "traviswebb@google.com",
    "download_url": "https://files.pythonhosted.org/packages/e7/fe/dd14332f3e5e5cfef654d9f21e791fa11bcf890df431f757b70cc9dfa965/geobeam-1.1.2.tar.gz",
    "platform": null,
    "description": "geobeam adds GIS capabilities to your Apache Beam pipelines.\n\n## What does geobeam do?\n\n`geobeam` enables you to ingest and analyze massive amounts of geospatial data in parallel using [Dataflow](https://cloud.google.com/dataflow).\ngeobeam provides a set of [FileBasedSource](https://beam.apache.org/releases/pydoc/2.41.0/apache_beam.io.filebasedsource.html)\nclasses that make it easy to read, process, and write geospatial data, and provides a set of helpful\nApache Beam transforms and utilities that make it easier to process GIS data in your Dataflow pipelines.\n\nSee the [Full Documentation](https://storage.googleapis.com/geobeam/docs/all.pdf) for complete API specification.\n\n### Requirements\n- Apache Beam 2.46+\n- Python 3.8+\n\n> Note: Make sure the Python version used to run the pipeline matches the version in the built container.\n\n### Supported input types\n\n| **File format** | **Data type** | **Geobeam class**  |\n|:----------------|:--------------|:-------------------|\n| `tiff`         | raster        | `RasterBlockSource` and `RasterPolygonSource`\n| `shp`          | vector        | `ShapefileSource`\n| `gdb`          | vector        | `GeodatabaseSource`\n| `json`         | vector        | `GeoJSONSource`\n| URL            | vector        | `ESRIServerSource`\n\n### Included libraries\n\n`geobeam` includes several python modules that allow you to perform a wide variety of operations and analyses on your geospatial data.\n\n| **Module**      | **Version** | **Description** |\n|:----------------|:------------|:----------------|\n| [gdal](https://pypi.org/project/GDAL/)          | 3.5.2       | python bindings for GDAL\n| [rasterio](https://pypi.org/project/rasterio/)  | 1.3.2       | reads and writes geospatial raster data\n| [fiona](https://pypi.org/project/Fiona/)        | 1.8.21      | reads and writes geospatial vector data\n| [shapely](https://pypi.org/project/Shapely/)    | 1.8.4       | manipulation and analysis of geometric objects in the cartesian plane\n| [esridump](https://pypi.org/project/esridump/)  | 1.11.0      | read layer from ESRI server\n\n\n## How to Use\n\n### 1. Install the module\n```\npip install geobeam\n```\n\n### 2. Write your pipeline\n\nWrite a normal Apache Beam pipeline using one of `geobeam`s file sources.\nSee [`geobeam/examples`](https://github.com/GoogleCloudPlatform/dataflow-geobeam/tree/main/geobeam/examples) for inspiration.\n\n### 3. Run\n\n#### Run locally\n\n```\npython -m geobeam.examples.geotiff_dem \\\n  --gcs_url gs://geobeam/examples/dem-clipped-test.tif \\\n  --dataset examples \\\n  --table dem \\\n  --band_column elev \\\n  --runner DirectRunner \\\n  --temp_location <temp gs://> \\\n  --project <project_id>\n```\n\n> Note: Some of the provided examples may take a very long time to run locally...\n\n#### Run in Dataflow\n\n##### Write a Dockerfile\n\nThis will run in Dataflow as a [custom container](https://cloud.google.com/dataflow/docs/guides/using-custom-containers) based on the [`dataflow-geobeam/base`](Dockerfile) image.\nIt is recommended that you publish your own container based on the Dockerfile in this repository and store it in your project's GCR registry.\n\n\n```dockerfile\nFROM gcr.io/dataflow-geobeam/base\n\nRUN pip install geobeam\n\nCOPY requirements.txt .\nRUN pip install -r requirements.txt\n\nCOPY . .\n```\n\n```bash\n# build locally with docker\ndocker build -t gcr.io/<project_id>/geobeam\ndocker push gcr.io/<project_id>/geobeam\n\n# or build with Cloud Build\ngcloud builds submit --timeout 3600s --worker_machine_type n1-highcpu-8\n```\n\n#### Start the Dataflow job\n\n```\n# run the geotiff_soilgrid example in dataflow\npython -m geobeam.examples.geotiff_soilgrid \\\n  --gcs_url gs://geobeam/examples/AWCh3_M_sl1_250m_ll.tif \\\n  --dataset examples \\\n  --table soilgrid \\\n  --band_column h3 \\\n  --runner DataflowRunner \\\n  --sdk_container_image gcr.io/dataflow-geobeam/base \\\n  --temp_location <temp bucket> \\\n  --service_account_email <service account> \\\n  --region us-central1 \\\n  --max_num_workers 2 \\\n  --worker_machine_type c2-standard-30 \\\n```\n\n\n## Examples\n\n#### Read Raster as Blocks\n```py\ndef run(options):\n  from geobeam.io import RasterBlockSource\n  from geobeam.fn import format_rasterblock_record\n\n  with beam.Pipeline(options) as p:\n    (p  | 'ReadRaster' >> beam.io.Read(RasterBlockSource(gcs_url))\n        | 'FormatRecord' >> beam.Map(format_rasterblock_record)\n        | 'WriteToBigquery' >> beam.io.WriteToBigQuery('geo.dem'))\n```\n\n#### Validate and Simplify Shapefile\n\n```py\ndef run(options):\n  from geobeam.io import ShapefileSource\n  from geobeam.fn import make_valid, filter_invalid, format_record\n\n  with beam.Pipeline(options) as p:\n    (p  | 'ReadShapefile' >> beam.io.Read(ShapefileSource(gcs_url))\n        | 'Validate' >> beam.Map(make_valid)\n        | 'FilterInvalid' >> beam.Filter(filter_invalid)\n        | 'FormatRecord' >> beam.Map(format_record)\n        | 'WriteToBigquery' >> beam.io.WriteToBigQuery('geo.parcel'))\n```\n\nSee `geobeam/examples/` for complete examples.\n\nA number of example pipelines are available in the `geobeam/examples/` folder.\nTo run them in your Google Cloud project, run the included [terraform](https://www.terraform.io) file to set up the Bigquery dataset and tables used by the example pipelines.\n\nOpen up Bigquery GeoViz to visualize your data.\n\n### Shapefile Example\n\nThe National Flood Hazard Layer loaded from a shapefile. Example pipeline at [`geobeam/examples/shapefile_nfhl.py`](https://github.com/GoogleCloudPlatform/dataflow-geobeam/blob/main/geobeam/examples/shapefile_nfhl.py)\n\n![](https://storage.googleapis.com/geobeam/examples/geobeam-nfhl-geoviz-example.png)\n\n### Raster Example\n\nThe Digital Elevation Model is a high-resolution model of elevation measurements at 1-meter resolution. (Values converted to centimeters). Example pipeline: [`geobeam/examples/geotiff_dem.py`](https://github.com/GoogleCloudPlatform/dataflow-geobeam/blob/main/geobeam/examples/geotiff_dem.py).\n\n![](https://storage.googleapis.com/geobeam/examples/geobeam-dem-example-geoviz.png)\n\n## Included Transforms\n\nThe `geobeam.fn` module includes several [Beam Transforms](https://beam.apache.org/documentation/programming-guide/#transforms) that you can use in your pipelines.\n\n| **Module**      | **Description**\n|:----------------|:------------|\n| `geobeam.fn.make_valid`     | Attempt to make all geometries valid. \n| `geobeam.fn.filter_invalid` | Filter out invalid geometries that cannot be made valid\n| `geobeam.fn.format_record`  | Format the (props, geom) tuple received from a vector source into a `dict` that can be inserted into the destination table\n| `geobeam.fn.format_rasterblock_record` | Format the output record for blocks read from `RasterBlockSource`\n| `geobeam.fn.format_rasterpolygon_record` | Format the output record for blocks read from `RasterPolygonSource`\n\n\n## Execution parameters\n\nEach FileSource accepts several parameters that you can use to configure how your data is loaded and processed.\nThese can be parsed as pipeline arguments and passed into the respective FileSources as seen in the examples pipelines.\n\n| **Parameter**      | **Input type** | **Description** | **Default** | **Required?**\n|:-------------------|:---------------|:----------------|:------------|---------------|\n| `skip_reproject`   | All     | True to skip reprojection during read | `False` | No\n| `in_epsg`          | All     | An [EPSG integer](https://en.wikipedia.org/wiki/EPSG_Geodetic_Parameter_Dataset) to override the input source CRS to reproject from | | No\n| `in_proj`          | All     | A [PROJ string](https://proj.org/usage/quickstart.html) to override the input source CRS | | No\n| `band_number`      | Raster  | The raster band to read from | `1` | No\n| `include_nodata`   | Raster  | True to include `nodata` values | `False` | No\n| `return_block_transform` | Raster | True to include rasterio `transform` object with each block to use with `geobeam.fn.format_rasterpixel_record` | `False` | No\n| `layer_name`       | Vector  | Name of layer to read | | Yes, for shapefiles\n| `gdb_name`         | Vector  | Name of geodatabase directory in a gdb zip archive | | Yes, for GDB files\n\n\n## License\n\nThis is not an officially supported Google product, though support will be provided on a best-effort basis.\n\n```\nCopyright 2023 Google LLC\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n    https://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n```\n\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "geobeam adds GIS capabilities to your Apache Beam pipelines",
    "version": "1.1.2",
    "project_urls": null,
    "split_keywords": [
        "beam",
        "dataflow",
        "gdal",
        "gis"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "30543915b5b8e3607102fbca02dc20598c90fb8eb36f142eb0f97669a6afff61",
                "md5": "8a029c7a1c4685bb1f8d347729db3930",
                "sha256": "5f18c88e00c960a616d684ca32cffa4ebc9ddfd54a695bbf263a356abac996af"
            },
            "downloads": -1,
            "filename": "geobeam-1.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8a029c7a1c4685bb1f8d347729db3930",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 34371,
            "upload_time": "2023-05-16T17:11:38",
            "upload_time_iso_8601": "2023-05-16T17:11:38.204463Z",
            "url": "https://files.pythonhosted.org/packages/30/54/3915b5b8e3607102fbca02dc20598c90fb8eb36f142eb0f97669a6afff61/geobeam-1.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e7fedd14332f3e5e5cfef654d9f21e791fa11bcf890df431f757b70cc9dfa965",
                "md5": "534dbd6eec1537d3a7b626d3bdf8b215",
                "sha256": "21fc8013690b0fc409159af80ea90dfc188b07562d03c5fe21262eef745b6b2f"
            },
            "downloads": -1,
            "filename": "geobeam-1.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "534dbd6eec1537d3a7b626d3bdf8b215",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 18818,
            "upload_time": "2023-05-16T17:11:39",
            "upload_time_iso_8601": "2023-05-16T17:11:39.967009Z",
            "url": "https://files.pythonhosted.org/packages/e7/fe/dd14332f3e5e5cfef654d9f21e791fa11bcf890df431f757b70cc9dfa965/geobeam-1.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-16 17:11:39",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "geobeam"
}
        
Elapsed time: 0.06792s