# vector2dggs
[![pypi](https://img.shields.io/pypi/v/vector2dggs?label=vector2dggs)](https://pypi.org/project/vector2dggs/)
Python-based CLI tool to index raster files to DGGS in parallel, writing out to Parquet.
This is the vector equivalent of [raster2dggs](https://github.com/manaakiwhenua/raster2dggs).
Currently only supports H3 DGGS, and probably has other limitations since it has been developed for a specific internal use case, though it is intended as a general-purpose abstraction. Contributions, suggestions, bug reports and strongly worded letters are all welcome.
Currently only supports polygons; but both coverages (strictly non-overlapping polygons), and sets of polygons that do/may overlap, are supported. Overlapping polygons are captured by ensuring that DGGS cell IDs may be non-unique (repeated) in the output.
![Example use case for vector2dggs, showing parcels indexed to a high H3 resolution](./docs/imgs/vector2dggs-example.png "Example use case for vector2dggs, showing parcels indexed to a high H3 resolution")
## Installation
```bash
pip install vector2dggs
```
## Usage
```bash
vector2dggs h3 --help
Usage: vector2dggs h3 [OPTIONS] VECTOR_INPUT OUTPUT_DIRECTORY
Ingest a vector dataset and index it to the H3 DGGS.
VECTOR_INPUT is the path to input vector geospatial data. OUTPUT_DIRECTORY
should be a directory, not a file or database table, as it will instead be
the write location for an Apache Parquet data store.
Options:
-v, --verbosity LVL Either CRITICAL, ERROR, WARNING, INFO or
DEBUG [default: INFO]
-r, --resolution [0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15]
H3 resolution to index [required]
-pr, --parent_res [0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15]
H3 Parent resolution for the output
partition. Defaults to resolution - 6
-id, --id_field TEXT Field to use as an ID; defaults to a
constructed single 0...n index on the
original feature order.
-k, --keep_attributes Retain attributes in output. The default is
to create an output that only includes H3
cell ID and the ID given by the -id field
(or the default index ID).
-ch, --chunksize INTEGER The number of rows per index partition to
use when spatially partioning. Adjusting
this number will trade off memory use and
time. [default: 50; required]
-s, --spatial_sorting [hilbert|morton|geohash]
Spatial sorting method when perfoming
spatial partitioning. [default: hilbert]
-crs, --cut_crs INTEGER Set the coordinate reference system (CRS)
used for cutting large polygons (see `--cur-
threshold`). Defaults to the same CRS as the
input. Should be a valid EPSG code.
-c, --cut_threshold INTEGER Cutting up large polygons into smaller
pieces based on a target length. Units are
assumed to match the input CRS units unless
the `--cut_crs` is also given, in which case
units match the units of the supplied CRS.
[default: 5000; required]
-t, --threads INTEGER Amount of threads used for operation
[default: 7]
-tbl, --table TEXT Name of the table to read when using a
spatial database connection as input
-g, --geom_col TEXT Column name to use when using a spatial
database connection as input [default:
geom]
--tempdir PATH Temporary data is created during the
execution of this program. This parameter
allows you to control where this data will
be written.
-o, --overwrite
--version Show the version and exit.
--help Show this message and exit.
```
## Visualising output
Output is in the Apache Parquet format, a directory with one file per partition.
For a quick view of your output, you can read Apache Parquet with pandas, and then use h3-pandas and geopandas to convert this into a GeoPackage or GeoParquet for visualisation in a desktop GIS, such as QGIS. The Apache Parquet output is indexed by an ID column (which you can specify), so it should be ready for two intended use-cases:
- Joining attribute data from the original feature-level data onto computer DGGS cells.
- Joining other data to this output on the H3 cell ID. (The output has a column like `h3_\d{2}`, e.g. `h3_09` or `h3_12` according to the target resolution.)
Geoparquet output (hexagon boundaries):
```python
>>> import pandas as pd
>>> import h3pandas
>>> g = pd.read_parquet('./output-data/nz-property-titles.12.parquet').h3.h3_to_geo_boundary()
>>> g
title_no geometry
h3_12
8cbb53a734553ff NA94D/635 POLYGON ((174.28483 -35.69315, 174.28482 -35.6...
8cbb53a734467ff NA94D/635 POLYGON ((174.28454 -35.69333, 174.28453 -35.6...
8cbb53a734445ff NA94D/635 POLYGON ((174.28416 -35.69368, 174.28415 -35.6...
8cbb53a734551ff NA94D/635 POLYGON ((174.28496 -35.69329, 174.28494 -35.6...
8cbb53a734463ff NA94D/635 POLYGON ((174.28433 -35.69335, 174.28432 -35.6...
... ... ...
8cbb53a548b2dff NA62D/324 POLYGON ((174.30249 -35.69369, 174.30248 -35.6...
8cbb53a548b61ff NA62D/324 POLYGON ((174.30232 -35.69402, 174.30231 -35.6...
8cbb53a548b11ff NA57C/785 POLYGON ((174.30140 -35.69348, 174.30139 -35.6...
8cbb53a548b15ff NA57C/785 POLYGON ((174.30161 -35.69346, 174.30160 -35.6...
8cbb53a548b17ff NA57C/785 POLYGON ((174.30149 -35.69332, 174.30147 -35.6...
[52736 rows x 2 columns]
>>> g.to_parquet('./output-data/parcels.12.geo.parquet')
```
### For development
In brief, to get started:
- Install [Poetry](https://python-poetry.org/docs/basic-usage/)
- Install [GDAL](https://gdal.org/)
- If you're on Windows, `pip install gdal` may be necessary before running the subsequent commands.
- On Linux, install GDAL 3.6+ according to your platform-specific instructions, including development headers, i.e. `libgdal-dev`.
- Create the virtual environment with `poetry init`. This will install necessary dependencies.
- Subsequently, the virtual environment can be re-activated with `poetry shell`.
If you run `poetry install`, the CLI tool will be aliased so you can simply use `vector2dggs` rather than `poetry run vector2dggs`, which is the alternative if you do not `poetry install`.
Alternaively, it is also possible to isntall using pip with `pip install -e .`, and bypass Poetry.
#### Code formatting
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
Please run `black .` before committing.
## Example commands
With a local GPKG:
```bash
vector2dggs h3 -v DEBUG -id title_no -r 12 -o ~/Downloads/nz-property-titles.gpkg ~/Downloads/nz-property-titles.parquet
```
With a PostgreSQL/PostGIS connection:
```bash
vector2dggs h3 -v DEBUG -id ogc_fid -r 9 -p 5 -t 4 --overwrite -tbl topo50_lake postgresql://user:password@host:port/db ./topo50_lake.parquet
```
## Citation
```bibtex
@software{vector2dggs,
title={{vector2dggs}},
author={Ardo, James and Law, Richard},
url={https://github.com/manaakiwhenua/vector2dggs},
version={0.6.1},
date={2023-04-20}
}
```
APA/Harvard
> Ardo, J., & Law, R. (2023). vector2dggs (0.6.1) [Computer software]. https://github.com/manaakiwhenua/vector2dggs
[![manaakiwhenua-standards](https://github.com/manaakiwhenua/vector2dggs/workflows/manaakiwhenua-standards/badge.svg)](https://github.com/manaakiwhenua/manaakiwhenua-standards)
Raw data
{
"_id": null,
"home_page": "https://github.com/manaakiwhenua/vector2dggs",
"name": "vector2dggs",
"maintainer": "Richard Law",
"docs_url": null,
"requires_python": "<4.0,>=3.11",
"maintainer_email": "lawr@landcareresearch.co.nz",
"keywords": "dggs, vector, h3, cli",
"author": "James Ardo",
"author_email": "ardoj@landcareresearch.co.nz",
"download_url": "https://files.pythonhosted.org/packages/b7/62/886ad2354cb99e76293b345b8217259efe731104eab6c25278ca473fc12f/vector2dggs-0.6.1.tar.gz",
"platform": null,
"description": "# vector2dggs\n\n[![pypi](https://img.shields.io/pypi/v/vector2dggs?label=vector2dggs)](https://pypi.org/project/vector2dggs/)\n\nPython-based CLI tool to index raster files to DGGS in parallel, writing out to Parquet.\n\nThis is the vector equivalent of [raster2dggs](https://github.com/manaakiwhenua/raster2dggs).\n\nCurrently only supports H3 DGGS, and probably has other limitations since it has been developed for a specific internal use case, though it is intended as a general-purpose abstraction. Contributions, suggestions, bug reports and strongly worded letters are all welcome.\n\nCurrently only supports polygons; but both coverages (strictly non-overlapping polygons), and sets of polygons that do/may overlap, are supported. Overlapping polygons are captured by ensuring that DGGS cell IDs may be non-unique (repeated) in the output.\n\n![Example use case for vector2dggs, showing parcels indexed to a high H3 resolution](./docs/imgs/vector2dggs-example.png \"Example use case for vector2dggs, showing parcels indexed to a high H3 resolution\")\n\n## Installation\n\n```bash\npip install vector2dggs\n```\n\n## Usage\n\n```bash\nvector2dggs h3 --help\nUsage: vector2dggs h3 [OPTIONS] VECTOR_INPUT OUTPUT_DIRECTORY\n\n Ingest a vector dataset and index it to the H3 DGGS.\n\n VECTOR_INPUT is the path to input vector geospatial data. OUTPUT_DIRECTORY\n should be a directory, not a file or database table, as it will instead be\n the write location for an Apache Parquet data store.\n\nOptions:\n -v, --verbosity LVL Either CRITICAL, ERROR, WARNING, INFO or\n DEBUG [default: INFO]\n -r, --resolution [0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15]\n H3 resolution to index [required]\n -pr, --parent_res [0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15]\n H3 Parent resolution for the output\n partition. Defaults to resolution - 6\n -id, --id_field TEXT Field to use as an ID; defaults to a\n constructed single 0...n index on the\n original feature order.\n -k, --keep_attributes Retain attributes in output. The default is\n to create an output that only includes H3\n cell ID and the ID given by the -id field\n (or the default index ID).\n -ch, --chunksize INTEGER The number of rows per index partition to\n use when spatially partioning. Adjusting\n this number will trade off memory use and\n time. [default: 50; required]\n -s, --spatial_sorting [hilbert|morton|geohash]\n Spatial sorting method when perfoming\n spatial partitioning. [default: hilbert]\n -crs, --cut_crs INTEGER Set the coordinate reference system (CRS)\n used for cutting large polygons (see `--cur-\n threshold`). Defaults to the same CRS as the\n input. Should be a valid EPSG code.\n -c, --cut_threshold INTEGER Cutting up large polygons into smaller\n pieces based on a target length. Units are\n assumed to match the input CRS units unless\n the `--cut_crs` is also given, in which case\n units match the units of the supplied CRS.\n [default: 5000; required]\n -t, --threads INTEGER Amount of threads used for operation\n [default: 7]\n -tbl, --table TEXT Name of the table to read when using a\n spatial database connection as input\n -g, --geom_col TEXT Column name to use when using a spatial\n database connection as input [default:\n geom]\n --tempdir PATH Temporary data is created during the\n execution of this program. This parameter\n allows you to control where this data will\n be written.\n -o, --overwrite\n --version Show the version and exit.\n --help Show this message and exit.\n```\n\n## Visualising output\n\nOutput is in the Apache Parquet format, a directory with one file per partition.\n\nFor a quick view of your output, you can read Apache Parquet with pandas, and then use h3-pandas and geopandas to convert this into a GeoPackage or GeoParquet for visualisation in a desktop GIS, such as QGIS. The Apache Parquet output is indexed by an ID column (which you can specify), so it should be ready for two intended use-cases:\n- Joining attribute data from the original feature-level data onto computer DGGS cells.\n- Joining other data to this output on the H3 cell ID. (The output has a column like `h3_\\d{2}`, e.g. `h3_09` or `h3_12` according to the target resolution.)\n\nGeoparquet output (hexagon boundaries):\n\n```python\n>>> import pandas as pd\n>>> import h3pandas\n>>> g = pd.read_parquet('./output-data/nz-property-titles.12.parquet').h3.h3_to_geo_boundary()\n>>> g\n title_no geometry\nh3_12 \n8cbb53a734553ff NA94D/635 POLYGON ((174.28483 -35.69315, 174.28482 -35.6...\n8cbb53a734467ff NA94D/635 POLYGON ((174.28454 -35.69333, 174.28453 -35.6...\n8cbb53a734445ff NA94D/635 POLYGON ((174.28416 -35.69368, 174.28415 -35.6...\n8cbb53a734551ff NA94D/635 POLYGON ((174.28496 -35.69329, 174.28494 -35.6...\n8cbb53a734463ff NA94D/635 POLYGON ((174.28433 -35.69335, 174.28432 -35.6...\n... ... ...\n8cbb53a548b2dff NA62D/324 POLYGON ((174.30249 -35.69369, 174.30248 -35.6...\n8cbb53a548b61ff NA62D/324 POLYGON ((174.30232 -35.69402, 174.30231 -35.6...\n8cbb53a548b11ff NA57C/785 POLYGON ((174.30140 -35.69348, 174.30139 -35.6...\n8cbb53a548b15ff NA57C/785 POLYGON ((174.30161 -35.69346, 174.30160 -35.6...\n8cbb53a548b17ff NA57C/785 POLYGON ((174.30149 -35.69332, 174.30147 -35.6...\n\n[52736 rows x 2 columns]\n>>> g.to_parquet('./output-data/parcels.12.geo.parquet')\n```\n\n### For development\n\nIn brief, to get started:\n\n- Install [Poetry](https://python-poetry.org/docs/basic-usage/)\n- Install [GDAL](https://gdal.org/)\n - If you're on Windows, `pip install gdal` may be necessary before running the subsequent commands.\n - On Linux, install GDAL 3.6+ according to your platform-specific instructions, including development headers, i.e. `libgdal-dev`.\n- Create the virtual environment with `poetry init`. This will install necessary dependencies.\n- Subsequently, the virtual environment can be re-activated with `poetry shell`.\n\nIf you run `poetry install`, the CLI tool will be aliased so you can simply use `vector2dggs` rather than `poetry run vector2dggs`, which is the alternative if you do not `poetry install`.\n\nAlternaively, it is also possible to isntall using pip with `pip install -e .`, and bypass Poetry.\n\n#### Code formatting\n\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\nPlease run `black .` before committing.\n\n## Example commands\n\nWith a local GPKG:\n\n```bash\nvector2dggs h3 -v DEBUG -id title_no -r 12 -o ~/Downloads/nz-property-titles.gpkg ~/Downloads/nz-property-titles.parquet\n\n```\n\nWith a PostgreSQL/PostGIS connection:\n\n```bash\nvector2dggs h3 -v DEBUG -id ogc_fid -r 9 -p 5 -t 4 --overwrite -tbl topo50_lake postgresql://user:password@host:port/db ./topo50_lake.parquet\n```\n\n## Citation\n\n```bibtex\n@software{vector2dggs,\n title={{vector2dggs}},\n author={Ardo, James and Law, Richard},\n url={https://github.com/manaakiwhenua/vector2dggs},\n version={0.6.1},\n date={2023-04-20}\n}\n```\n\nAPA/Harvard\n\n> Ardo, J., & Law, R. (2023). vector2dggs (0.6.1) [Computer software]. https://github.com/manaakiwhenua/vector2dggs\n\n[![manaakiwhenua-standards](https://github.com/manaakiwhenua/vector2dggs/workflows/manaakiwhenua-standards/badge.svg)](https://github.com/manaakiwhenua/manaakiwhenua-standards)\n",
"bugtrack_url": null,
"license": "LGPL-3.0-or-later",
"summary": "CLI DGGS indexer for vector geospatial data",
"version": "0.6.1",
"project_urls": {
"Homepage": "https://github.com/manaakiwhenua/vector2dggs",
"Repository": "https://github.com/manaakiwhenua/vector2dggs"
},
"split_keywords": [
"dggs",
" vector",
" h3",
" cli"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "db13ba419de8bc11454a6da1b60be48f275615d042397e3de80d649fa286a536",
"md5": "663eecd782e2e8a9bf04ed6df2a15d2d",
"sha256": "bfa23fdea4c4a989c9cac2f506f3918c2678e480b489a9563a3ba0586f93a1c7"
},
"downloads": -1,
"filename": "vector2dggs-0.6.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "663eecd782e2e8a9bf04ed6df2a15d2d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.11",
"size": 26942,
"upload_time": "2024-08-15T04:23:02",
"upload_time_iso_8601": "2024-08-15T04:23:02.260601Z",
"url": "https://files.pythonhosted.org/packages/db/13/ba419de8bc11454a6da1b60be48f275615d042397e3de80d649fa286a536/vector2dggs-0.6.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b762886ad2354cb99e76293b345b8217259efe731104eab6c25278ca473fc12f",
"md5": "7516638c75356c30020472db8999ed70",
"sha256": "8e1b508fbd449decd671ca06f91d4d37008566df181192d46737053afb89e8c6"
},
"downloads": -1,
"filename": "vector2dggs-0.6.1.tar.gz",
"has_sig": false,
"md5_digest": "7516638c75356c30020472db8999ed70",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.11",
"size": 12650,
"upload_time": "2024-08-15T04:23:03",
"upload_time_iso_8601": "2024-08-15T04:23:03.401548Z",
"url": "https://files.pythonhosted.org/packages/b7/62/886ad2354cb99e76293b345b8217259efe731104eab6c25278ca473fc12f/vector2dggs-0.6.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-15 04:23:03",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "manaakiwhenua",
"github_project": "vector2dggs",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "vector2dggs"
}