vector2dggs

Name	vector2dggs JSON
Version	0.6.1 JSON
	download
home_page	https://github.com/manaakiwhenua/vector2dggs
Summary	CLI DGGS indexer for vector geospatial data
upload_time	2024-08-15 04:23:03
maintainer	Richard Law
docs_url	None
author	James Ardo
requires_python	<4.0,>=3.11
license	LGPL-3.0-or-later
keywords	dggs vector h3 cli
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # vector2dggs

[![pypi](https://img.shields.io/pypi/v/vector2dggs?label=vector2dggs)](https://pypi.org/project/vector2dggs/)

Python-based CLI tool to index raster files to DGGS in parallel, writing out to Parquet.

This is the vector equivalent of [raster2dggs](https://github.com/manaakiwhenua/raster2dggs).

Currently only supports H3 DGGS, and probably has other limitations since it has been developed for a specific internal use case, though it is intended as a general-purpose abstraction. Contributions, suggestions, bug reports and strongly worded letters are all welcome.

Currently only supports polygons; but both coverages (strictly non-overlapping polygons), and sets of polygons that do/may overlap, are supported. Overlapping polygons are captured by ensuring that DGGS cell IDs may be non-unique (repeated) in the output.

![Example use case for vector2dggs, showing parcels indexed to a high H3 resolution](./docs/imgs/vector2dggs-example.png "Example use case for vector2dggs, showing parcels indexed to a high H3 resolution")

## Installation

```bash
pip install vector2dggs
```

## Usage

```bash
vector2dggs h3 --help
Usage: vector2dggs h3 [OPTIONS] VECTOR_INPUT OUTPUT_DIRECTORY

  Ingest a vector dataset and index it to the H3 DGGS.

  VECTOR_INPUT is the path to input vector geospatial data. OUTPUT_DIRECTORY
  should be a directory, not a file or database table, as it will instead be
  the write location for an Apache Parquet data store.

Options:
  -v, --verbosity LVL             Either CRITICAL, ERROR, WARNING, INFO or
                                  DEBUG  [default: INFO]
  -r, --resolution [0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15]
                                  H3 resolution to index  [required]
  -pr, --parent_res [0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15]
                                  H3 Parent resolution for the output
                                  partition. Defaults to resolution - 6
  -id, --id_field TEXT            Field to use as an ID; defaults to a
                                  constructed single 0...n index on the
                                  original feature order.
  -k, --keep_attributes           Retain attributes in output. The default is
                                  to create an output that only includes H3
                                  cell ID and the ID given by the -id field
                                  (or the default index ID).
  -ch, --chunksize INTEGER        The number of rows per index partition to
                                  use when spatially partioning. Adjusting
                                  this number will trade off memory use and
                                  time.  [default: 50; required]
  -s, --spatial_sorting [hilbert|morton|geohash]
                                  Spatial sorting method when perfoming
                                  spatial partitioning.  [default: hilbert]
  -crs, --cut_crs INTEGER         Set the coordinate reference system (CRS)
                                  used for cutting large polygons (see `--cur-
                                  threshold`). Defaults to the same CRS as the
                                  input. Should be a valid EPSG code.
  -c, --cut_threshold INTEGER     Cutting up large polygons into smaller
                                  pieces based on a target length. Units are
                                  assumed to match the input CRS units unless
                                  the `--cut_crs` is also given, in which case
                                  units match the units of the supplied CRS.
                                  [default: 5000; required]
  -t, --threads INTEGER           Amount of threads used for operation
                                  [default: 7]
  -tbl, --table TEXT              Name of the table to read when using a
                                  spatial database connection as input
  -g, --geom_col TEXT             Column name to use when using a spatial
                                  database connection as input  [default:
                                  geom]
  --tempdir PATH                  Temporary data is created during the
                                  execution of this program. This parameter
                                  allows you to control where this data will
                                  be written.
  -o, --overwrite
  --version                       Show the version and exit.
  --help                          Show this message and exit.
```

## Visualising output

Output is in the Apache Parquet format, a directory with one file per partition.

For a quick view of your output, you can read Apache Parquet with pandas, and then use h3-pandas and geopandas to convert this into a GeoPackage or GeoParquet for visualisation in a desktop GIS, such as QGIS. The Apache Parquet output is indexed by an ID column (which you can specify), so it should be ready for two intended use-cases:
- Joining attribute data from the original feature-level data onto computer DGGS cells.
- Joining other data to this output on the H3 cell ID. (The output has a column like `h3_\d{2}`, e.g. `h3_09` or `h3_12` according to the target resolution.)

Geoparquet output (hexagon boundaries):

```python
>>> import pandas as pd
>>> import h3pandas
>>> g = pd.read_parquet('./output-data/nz-property-titles.12.parquet').h3.h3_to_geo_boundary()
>>> g
                  title_no                                           geometry
h3_12                                                                        
8cbb53a734553ff  NA94D/635  POLYGON ((174.28483 -35.69315, 174.28482 -35.6...
8cbb53a734467ff  NA94D/635  POLYGON ((174.28454 -35.69333, 174.28453 -35.6...
8cbb53a734445ff  NA94D/635  POLYGON ((174.28416 -35.69368, 174.28415 -35.6...
8cbb53a734551ff  NA94D/635  POLYGON ((174.28496 -35.69329, 174.28494 -35.6...
8cbb53a734463ff  NA94D/635  POLYGON ((174.28433 -35.69335, 174.28432 -35.6...
...                    ...                                                ...
8cbb53a548b2dff  NA62D/324  POLYGON ((174.30249 -35.69369, 174.30248 -35.6...
8cbb53a548b61ff  NA62D/324  POLYGON ((174.30232 -35.69402, 174.30231 -35.6...
8cbb53a548b11ff  NA57C/785  POLYGON ((174.30140 -35.69348, 174.30139 -35.6...
8cbb53a548b15ff  NA57C/785  POLYGON ((174.30161 -35.69346, 174.30160 -35.6...
8cbb53a548b17ff  NA57C/785  POLYGON ((174.30149 -35.69332, 174.30147 -35.6...

[52736 rows x 2 columns]
>>> g.to_parquet('./output-data/parcels.12.geo.parquet')
```

### For development

In brief, to get started:

- Install [Poetry](https://python-poetry.org/docs/basic-usage/)
- Install [GDAL](https://gdal.org/)
    - If you're on Windows, `pip install gdal` may be necessary before running the subsequent commands.
    - On Linux, install GDAL 3.6+ according to your platform-specific instructions, including development headers, i.e. `libgdal-dev`.
- Create the virtual environment with `poetry init`. This will install necessary dependencies.
- Subsequently, the virtual environment can be re-activated with `poetry shell`.

If you run `poetry install`, the CLI tool will be aliased so you can simply use `vector2dggs` rather than `poetry run vector2dggs`, which is the alternative if you do not `poetry install`.

Alternaively, it is also possible to isntall using pip with `pip install -e .`, and bypass Poetry.

#### Code formatting

[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

Please run `black .` before committing.

## Example commands

With a local GPKG:

```bash
vector2dggs h3 -v DEBUG -id title_no -r 12 -o ~/Downloads/nz-property-titles.gpkg ~/Downloads/nz-property-titles.parquet

```

With a PostgreSQL/PostGIS connection:

```bash
vector2dggs h3 -v DEBUG -id ogc_fid -r 9 -p 5 -t 4 --overwrite -tbl topo50_lake postgresql://user:password@host:port/db ./topo50_lake.parquet
```

## Citation

```bibtex
@software{vector2dggs,
  title={{vector2dggs}},
  author={Ardo, James and Law, Richard},
  url={https://github.com/manaakiwhenua/vector2dggs},
  version={0.6.1},
  date={2023-04-20}
}
```

APA/Harvard

> Ardo, J., & Law, R. (2023). vector2dggs (0.6.1) [Computer software]. https://github.com/manaakiwhenua/vector2dggs

[![manaakiwhenua-standards](https://github.com/manaakiwhenua/vector2dggs/workflows/manaakiwhenua-standards/badge.svg)](https://github.com/manaakiwhenua/manaakiwhenua-standards)

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/manaakiwhenua/vector2dggs",
    "name": "vector2dggs",
    "maintainer": "Richard Law",
    "docs_url": null,
    "requires_python": "<4.0,>=3.11",
    "maintainer_email": "lawr@landcareresearch.co.nz",
    "keywords": "dggs, vector, h3, cli",
    "author": "James Ardo",
    "author_email": "ardoj@landcareresearch.co.nz",
    "download_url": "https://files.pythonhosted.org/packages/b7/62/886ad2354cb99e76293b345b8217259efe731104eab6c25278ca473fc12f/vector2dggs-0.6.1.tar.gz",
    "platform": null,
    "description": "# vector2dggs\n\n[![pypi](https://img.shields.io/pypi/v/vector2dggs?label=vector2dggs)](https://pypi.org/project/vector2dggs/)\n\nPython-based CLI tool to index raster files to DGGS in parallel, writing out to Parquet.\n\nThis is the vector equivalent of [raster2dggs](https://github.com/manaakiwhenua/raster2dggs).\n\nCurrently only supports H3 DGGS, and probably has other limitations since it has been developed for a specific internal use case, though it is intended as a general-purpose abstraction. Contributions, suggestions, bug reports and strongly worded letters are all welcome.\n\nCurrently only supports polygons; but both coverages (strictly non-overlapping polygons), and sets of polygons that do/may overlap, are supported. Overlapping polygons are captured by ensuring that DGGS cell IDs may be non-unique (repeated) in the output.\n\n![Example use case for vector2dggs, showing parcels indexed to a high H3 resolution](./docs/imgs/vector2dggs-example.png \"Example use case for vector2dggs, showing parcels indexed to a high H3 resolution\")\n\n## Installation\n\n```bash\npip install vector2dggs\n```\n\n## Usage\n\n```bash\nvector2dggs h3 --help\nUsage: vector2dggs h3 [OPTIONS] VECTOR_INPUT OUTPUT_DIRECTORY\n\n  Ingest a vector dataset and index it to the H3 DGGS.\n\n  VECTOR_INPUT is the path to input vector geospatial data. OUTPUT_DIRECTORY\n  should be a directory, not a file or database table, as it will instead be\n  the write location for an Apache Parquet data store.\n\nOptions:\n  -v, --verbosity LVL             Either CRITICAL, ERROR, WARNING, INFO or\n                                  DEBUG  [default: INFO]\n  -r, --resolution [0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15]\n                                  H3 resolution to index  [required]\n  -pr, --parent_res [0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15]\n                                  H3 Parent resolution for the output\n                                  partition. Defaults to resolution - 6\n  -id, --id_field TEXT            Field to use as an ID; defaults to a\n                                  constructed single 0...n index on the\n                                  original feature order.\n  -k, --keep_attributes           Retain attributes in output. The default is\n                                  to create an output that only includes H3\n                                  cell ID and the ID given by the -id field\n                                  (or the default index ID).\n  -ch, --chunksize INTEGER        The number of rows per index partition to\n                                  use when spatially partioning. Adjusting\n                                  this number will trade off memory use and\n                                  time.  [default: 50; required]\n  -s, --spatial_sorting [hilbert|morton|geohash]\n                                  Spatial sorting method when perfoming\n                                  spatial partitioning.  [default: hilbert]\n  -crs, --cut_crs INTEGER         Set the coordinate reference system (CRS)\n                                  used for cutting large polygons (see `--cur-\n                                  threshold`). Defaults to the same CRS as the\n                                  input. Should be a valid EPSG code.\n  -c, --cut_threshold INTEGER     Cutting up large polygons into smaller\n                                  pieces based on a target length. Units are\n                                  assumed to match the input CRS units unless\n                                  the `--cut_crs` is also given, in which case\n                                  units match the units of the supplied CRS.\n                                  [default: 5000; required]\n  -t, --threads INTEGER           Amount of threads used for operation\n                                  [default: 7]\n  -tbl, --table TEXT              Name of the table to read when using a\n                                  spatial database connection as input\n  -g, --geom_col TEXT             Column name to use when using a spatial\n                                  database connection as input  [default:\n                                  geom]\n  --tempdir PATH                  Temporary data is created during the\n                                  execution of this program. This parameter\n                                  allows you to control where this data will\n                                  be written.\n  -o, --overwrite\n  --version                       Show the version and exit.\n  --help                          Show this message and exit.\n```\n\n## Visualising output\n\nOutput is in the Apache Parquet format, a directory with one file per partition.\n\nFor a quick view of your output, you can read Apache Parquet with pandas, and then use h3-pandas and geopandas to convert this into a GeoPackage or GeoParquet for visualisation in a desktop GIS, such as QGIS. The Apache Parquet output is indexed by an ID column (which you can specify), so it should be ready for two intended use-cases:\n- Joining attribute data from the original feature-level data onto computer DGGS cells.\n- Joining other data to this output on the H3 cell ID. (The output has a column like `h3_\\d{2}`, e.g. `h3_09` or `h3_12` according to the target resolution.)\n\nGeoparquet output (hexagon boundaries):\n\n```python\n>>> import pandas as pd\n>>> import h3pandas\n>>> g = pd.read_parquet('./output-data/nz-property-titles.12.parquet').h3.h3_to_geo_boundary()\n>>> g\n                  title_no                                           geometry\nh3_12                                                                        \n8cbb53a734553ff  NA94D/635  POLYGON ((174.28483 -35.69315, 174.28482 -35.6...\n8cbb53a734467ff  NA94D/635  POLYGON ((174.28454 -35.69333, 174.28453 -35.6...\n8cbb53a734445ff  NA94D/635  POLYGON ((174.28416 -35.69368, 174.28415 -35.6...\n8cbb53a734551ff  NA94D/635  POLYGON ((174.28496 -35.69329, 174.28494 -35.6...\n8cbb53a734463ff  NA94D/635  POLYGON ((174.28433 -35.69335, 174.28432 -35.6...\n...                    ...                                                ...\n8cbb53a548b2dff  NA62D/324  POLYGON ((174.30249 -35.69369, 174.30248 -35.6...\n8cbb53a548b61ff  NA62D/324  POLYGON ((174.30232 -35.69402, 174.30231 -35.6...\n8cbb53a548b11ff  NA57C/785  POLYGON ((174.30140 -35.69348, 174.30139 -35.6...\n8cbb53a548b15ff  NA57C/785  POLYGON ((174.30161 -35.69346, 174.30160 -35.6...\n8cbb53a548b17ff  NA57C/785  POLYGON ((174.30149 -35.69332, 174.30147 -35.6...\n\n[52736 rows x 2 columns]\n>>> g.to_parquet('./output-data/parcels.12.geo.parquet')\n```\n\n### For development\n\nIn brief, to get started:\n\n- Install [Poetry](https://python-poetry.org/docs/basic-usage/)\n- Install [GDAL](https://gdal.org/)\n    - If you're on Windows, `pip install gdal` may be necessary before running the subsequent commands.\n    - On Linux, install GDAL 3.6+ according to your platform-specific instructions, including development headers, i.e. `libgdal-dev`.\n- Create the virtual environment with `poetry init`. This will install necessary dependencies.\n- Subsequently, the virtual environment can be re-activated with `poetry shell`.\n\nIf you run `poetry install`, the CLI tool will be aliased so you can simply use `vector2dggs` rather than `poetry run vector2dggs`, which is the alternative if you do not `poetry install`.\n\nAlternaively, it is also possible to isntall using pip with `pip install -e .`, and bypass Poetry.\n\n#### Code formatting\n\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\nPlease run `black .` before committing.\n\n## Example commands\n\nWith a local GPKG:\n\n```bash\nvector2dggs h3 -v DEBUG -id title_no -r 12 -o ~/Downloads/nz-property-titles.gpkg ~/Downloads/nz-property-titles.parquet\n\n```\n\nWith a PostgreSQL/PostGIS connection:\n\n```bash\nvector2dggs h3 -v DEBUG -id ogc_fid -r 9 -p 5 -t 4 --overwrite -tbl topo50_lake postgresql://user:password@host:port/db ./topo50_lake.parquet\n```\n\n## Citation\n\n```bibtex\n@software{vector2dggs,\n  title={{vector2dggs}},\n  author={Ardo, James and Law, Richard},\n  url={https://github.com/manaakiwhenua/vector2dggs},\n  version={0.6.1},\n  date={2023-04-20}\n}\n```\n\nAPA/Harvard\n\n> Ardo, J., & Law, R. (2023). vector2dggs (0.6.1) [Computer software]. https://github.com/manaakiwhenua/vector2dggs\n\n[![manaakiwhenua-standards](https://github.com/manaakiwhenua/vector2dggs/workflows/manaakiwhenua-standards/badge.svg)](https://github.com/manaakiwhenua/manaakiwhenua-standards)\n",
    "bugtrack_url": null,
    "license": "LGPL-3.0-or-later",
    "summary": "CLI DGGS indexer for vector geospatial data",
    "version": "0.6.1",
    "project_urls": {
        "Homepage": "https://github.com/manaakiwhenua/vector2dggs",
        "Repository": "https://github.com/manaakiwhenua/vector2dggs"
    },
    "split_keywords": [
        "dggs",
        " vector",
        " h3",
        " cli"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "db13ba419de8bc11454a6da1b60be48f275615d042397e3de80d649fa286a536",
                "md5": "663eecd782e2e8a9bf04ed6df2a15d2d",
                "sha256": "bfa23fdea4c4a989c9cac2f506f3918c2678e480b489a9563a3ba0586f93a1c7"
            },
            "downloads": -1,
            "filename": "vector2dggs-0.6.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "663eecd782e2e8a9bf04ed6df2a15d2d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.11",
            "size": 26942,
            "upload_time": "2024-08-15T04:23:02",
            "upload_time_iso_8601": "2024-08-15T04:23:02.260601Z",
            "url": "https://files.pythonhosted.org/packages/db/13/ba419de8bc11454a6da1b60be48f275615d042397e3de80d649fa286a536/vector2dggs-0.6.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b762886ad2354cb99e76293b345b8217259efe731104eab6c25278ca473fc12f",
                "md5": "7516638c75356c30020472db8999ed70",
                "sha256": "8e1b508fbd449decd671ca06f91d4d37008566df181192d46737053afb89e8c6"
            },
            "downloads": -1,
            "filename": "vector2dggs-0.6.1.tar.gz",
            "has_sig": false,
            "md5_digest": "7516638c75356c30020472db8999ed70",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.11",
            "size": 12650,
            "upload_time": "2024-08-15T04:23:03",
            "upload_time_iso_8601": "2024-08-15T04:23:03.401548Z",
            "url": "https://files.pythonhosted.org/packages/b7/62/886ad2354cb99e76293b345b8217259efe731104eab6c25278ca473fc12f/vector2dggs-0.6.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-15 04:23:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "manaakiwhenua",
    "github_project": "vector2dggs",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "vector2dggs"
}

James Ardo