# vector2dggs
[](https://pypi.org/project/vector2dggs/)
Python-based CLI tool to index vector files to DGGS in parallel, writing out to Parquet.
This is the vector equivalent of [raster2dggs](https://github.com/manaakiwhenua/raster2dggs).
Currently this tool supports the following DGGSs:
- [H3](https://h3geo.org/)
- [rHEALPix](https://datastore.landcareresearch.co.nz/dataset/rhealpix-discrete-global-grid-system)
- [S2](https://s2geometry.io/)
... and the following geocode systems:
- [Geohash](https://en.wikipedia.org/wiki/Geohash) (points, polygons)
Contributions (especially for other DGGSs), suggestions, bug reports and strongly worded letters are all welcome.

## Installation
```bash
pip install vector2dggs
```
## Usage
```bash
vector2dggs --help
Usage: vector2dggs [OPTIONS] COMMAND [ARGS]...
Options:
--version Show the version and exit.
--help Show this message and exit.
Commands:
geohash Ingest a vector dataset and index it using the Geohash geocode...
h3 Ingest a vector dataset and index it to the H3 DGGS.
rhp Ingest a vector dataset and index it to the rHEALPix DGGS.
s2 Ingest a vector dataset and index it to the S2 DGGS.
```
```bash
vector2dggs h3 --help
Usage: vector2dggs h3 [OPTIONS] VECTOR_INPUT OUTPUT_DIRECTORY
Ingest a vector dataset and index it to the H3 DGGS.
VECTOR_INPUT is the path to input vector geospatial data. OUTPUT_DIRECTORY
should be a directory, not a file or database table, as it will instead be
the write location for an Apache Parquet data store.
Options:
-v, --verbosity LVL Either CRITICAL, ERROR, WARNING, INFO or
DEBUG [default: INFO]
-r, --resolution [0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15]
H3 resolution to index [required]
-pr, --parent_res [0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15]
H3 Parent resolution for the output
partition. Defaults to resolution - 6
-id, --id_field TEXT Field to use as an ID; defaults to a
constructed single 0...n index on the
original feature order.
-k, --keep_attributes Retain attributes in output. The default is
to create an output that only includes H3
cell ID and the ID given by the -id field
(or the default index ID).
-ch, --chunksize INTEGER The number of rows per index partition to
use when spatially partioning. Adjusting
this number will trade off memory use and
time. [default: 50; required]
-s, --spatial_sorting [hilbert|morton|geohash|none]
Spatial sorting method when perfoming
spatial partitioning. [default: none]
-crs, --cut_crs INTEGER Set the coordinate reference system (CRS)
used for cutting large geometries (see
`--cut_threshold`). Defaults to the same CRS
as the input. Should be a valid EPSG code.
-c, --cut_threshold INTEGER Cutting up large geometries into smaller
geometries based on a target length. Units
are assumed to match the input CRS units
unless the `--cut_crs` is also given, in
which case units match the units of the
supplied CRS. [default: 5000; required]
-t, --threads INTEGER Amount of threads used for operation
[default: NUM_CPUS - 1]
-cp, --compression TEXT Compression method to use for the output
Parquet files. Options include 'snappy',
'gzip', 'brotli', 'lz4', 'zstd', etc. Use
'none' for no compression. [default:
snappy]
-lyr, --layer TEXT Name of the layer or table to read when
using an input that supports layers or
tables
-g, --geom_col TEXT Column name to use when using a spatial
database connection as input [default:
geom]
--tempdir PATH Temporary data is created during the
execution of this program. This parameter
allows you to control where this data will
be written.
-co, --compact Compact the H3 cells up to the parent
resolution. Compaction requires an id_field.
-o, --overwrite
--version Show the version and exit.
--help Show this message and exit.
```
## Visualising output
Output is in the Apache Parquet format, a directory with one file per partition.
For a quick view of your output, you can read Apache Parquet with pandas, and then use tools like h3-pandas and geopandas to convert this into a GeoPackage or GeoParquet for visualisation in a desktop GIS, such as QGIS. The Apache Parquet output is indexed by an ID column (which you can specify), so it should be ready for two intended use-cases:
- Joining attribute data from the original feature-level data onto computer DGGS cells.
- Joining other data to this output on the DGGS cell ID. (The output has a column like `{dggs}_\d`, e.g. `h3_09` or `h3_12` according to the target resolution, zero-padded to account for the maximum resolution of the DGGS)
Geoparquet output (hexagon boundaries):
```python
>>> import pandas as pd
>>> import h3pandas
>>> g = pd.read_parquet('./output-data/nz-property-titles.12.parquet').h3.h3_to_geo_boundary()
>>> g
title_no geometry
h3_12
8cbb53a734553ff NA94D/635 POLYGON ((174.28483 -35.69315, 174.28482 -35.6...
8cbb53a734467ff NA94D/635 POLYGON ((174.28454 -35.69333, 174.28453 -35.6...
8cbb53a734445ff NA94D/635 POLYGON ((174.28416 -35.69368, 174.28415 -35.6...
8cbb53a734551ff NA94D/635 POLYGON ((174.28496 -35.69329, 174.28494 -35.6...
8cbb53a734463ff NA94D/635 POLYGON ((174.28433 -35.69335, 174.28432 -35.6...
... ... ...
8cbb53a548b2dff NA62D/324 POLYGON ((174.30249 -35.69369, 174.30248 -35.6...
8cbb53a548b61ff NA62D/324 POLYGON ((174.30232 -35.69402, 174.30231 -35.6...
8cbb53a548b11ff NA57C/785 POLYGON ((174.30140 -35.69348, 174.30139 -35.6...
8cbb53a548b15ff NA57C/785 POLYGON ((174.30161 -35.69346, 174.30160 -35.6...
8cbb53a548b17ff NA57C/785 POLYGON ((174.30149 -35.69332, 174.30147 -35.6...
[52736 rows x 2 columns]
>>> g.to_parquet('./output-data/parcels.12.geo.parquet')
```
An example for S2 output (using `s2sphere`):
```python
import pandas as pd
import geopandas as gpd
import s2sphere
from shapely.geometry import Polygon
RES = 18
df = pd.read_parquet(f'~/output-data/ponds-with-holes.s2.{RES}.pq')
def s2id_to_polygon(s2_id_hex):
cell_id = s2sphere.CellId.from_token(s2_id_hex)
cell = s2sphere.Cell(cell_id)
vertices = []
for i in range(4):
vertex = cell.get_vertex(i)
lat_lng = s2sphere.LatLng.from_point(vertex)
vertices.append((lat_lng.lng().degrees, lat_lng.lat().degrees)) # (lon, lat)
return Polygon(vertices)
df['geometry'] = df.index.to_series().apply(s2id_to_polygon)
df = gpd.GeoDataFrame(df, geometry='geometry', crs='EPSG:4326') # WGS84
df.to_parquet(f'sample-{RES}.parquet')
```
## Compaction
Compaction is supported with the `-co/--compact` argument. The result respects overlapping polygons by considering each feature independently. (In the below example output for rHEALPix, cells are shown with opacity; overlap is visible where there is a darker shade.) This does mean that the index of the result is not necessarily unique (unless your input is a vector _coverage_, i.e. it does not have overlaps.)

### For development
In brief, to get started:
- Install [Poetry](https://python-poetry.org/docs/basic-usage/)
- Install [GDAL](https://gdal.org/)
- If you're on Windows, `pip install gdal` may be necessary before running the subsequent commands.
- On Linux, install GDAL 3.8+ according to your platform-specific instructions, including development headers, i.e. `libgdal-dev`.
- Create the virtual environment with `poetry init`. This will install necessary dependencies.
- If the installation of `s2geometry` fails, you may require SWIG to build it. (A command like `conda install swig` or `sudo dnf install swig` depending on your platform).
- Subsequently, the virtual environment can be re-activated with `poetry shell`.
If you run `poetry install`, the CLI tool will be aliased so you can simply use `vector2dggs` rather than `poetry run vector2dggs`, which is the alternative if you do not `poetry install`.
Alternatively, it is also possible to install using pip with `pip install -e .`, and bypass Poetry.
#### Code formatting
[](https://github.com/psf/black)
Please run `black .` before committing.
## Example commands
With a local GPKG:
```bash
vector2dggs h3 -v DEBUG -id title_no -r 12 -o ~/Downloads/nz-property-titles.gpkg ~/Downloads/nz-property-titles.parquet
```
With a PostgreSQL/PostGIS connection:
```bash
vector2dggs h3 -v DEBUG -id ogc_fid -r 9 -p 5 -t 4 --overwrite -lyr topo50_lake postgresql://user:password@host:port/db ./topo50_lake.parquet
```
## Citation
```bibtex
@software{vector2dggs,
title={{vector2dggs}},
author={Ardo, James and Law, Richard},
url={https://github.com/manaakiwhenua/vector2dggs},
version={0.10.0},
date={2023-04-20}
}
```
APA/Harvard
> Ardo, J., & Law, R. (2023). vector2dggs (0.10.0) [Computer software]. https://github.com/manaakiwhenua/vector2dggs
[](https://github.com/manaakiwhenua/manaakiwhenua-standards)
Raw data
{
"_id": null,
"home_page": "https://github.com/manaakiwhenua/vector2dggs",
"name": "vector2dggs",
"maintainer": "Richard Law",
"docs_url": null,
"requires_python": "<4.0,>=3.11",
"maintainer_email": "lawr@landcareresearch.co.nz",
"keywords": "dggs, vector, h3, rHEALPix, cli",
"author": "James Ardo",
"author_email": "ardoj@landcareresearch.co.nz",
"download_url": "https://files.pythonhosted.org/packages/e7/f9/77da189d44c94418d2688c1c63528ce17a2fc80d15a03fc48ba7886b885b/vector2dggs-0.10.0.tar.gz",
"platform": null,
"description": "# vector2dggs\n\n[](https://pypi.org/project/vector2dggs/)\n\nPython-based CLI tool to index vector files to DGGS in parallel, writing out to Parquet.\n\nThis is the vector equivalent of [raster2dggs](https://github.com/manaakiwhenua/raster2dggs).\n\nCurrently this tool supports the following DGGSs:\n\n- [H3](https://h3geo.org/)\n- [rHEALPix](https://datastore.landcareresearch.co.nz/dataset/rhealpix-discrete-global-grid-system)\n- [S2](https://s2geometry.io/)\n\n... and the following geocode systems:\n\n- [Geohash](https://en.wikipedia.org/wiki/Geohash) (points, polygons)\n\nContributions (especially for other DGGSs), suggestions, bug reports and strongly worded letters are all welcome.\n\n\n\n## Installation\n\n```bash\npip install vector2dggs\n```\n\n## Usage\n\n```bash\nvector2dggs --help\n\nUsage: vector2dggs [OPTIONS] COMMAND [ARGS]...\n\nOptions:\n --version Show the version and exit.\n --help Show this message and exit.\n\nCommands:\n geohash Ingest a vector dataset and index it using the Geohash geocode...\n h3 Ingest a vector dataset and index it to the H3 DGGS.\n rhp Ingest a vector dataset and index it to the rHEALPix DGGS.\n s2 Ingest a vector dataset and index it to the S2 DGGS.\n```\n\n```bash\nvector2dggs h3 --help\nUsage: vector2dggs h3 [OPTIONS] VECTOR_INPUT OUTPUT_DIRECTORY\n\n Ingest a vector dataset and index it to the H3 DGGS.\n\n VECTOR_INPUT is the path to input vector geospatial data. OUTPUT_DIRECTORY\n should be a directory, not a file or database table, as it will instead be\n the write location for an Apache Parquet data store.\n\nOptions:\n -v, --verbosity LVL Either CRITICAL, ERROR, WARNING, INFO or\n DEBUG [default: INFO]\n -r, --resolution [0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15]\n H3 resolution to index [required]\n -pr, --parent_res [0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15]\n H3 Parent resolution for the output\n partition. Defaults to resolution - 6\n -id, --id_field TEXT Field to use as an ID; defaults to a\n constructed single 0...n index on the\n original feature order.\n -k, --keep_attributes Retain attributes in output. The default is\n to create an output that only includes H3\n cell ID and the ID given by the -id field\n (or the default index ID).\n -ch, --chunksize INTEGER The number of rows per index partition to\n use when spatially partioning. Adjusting\n this number will trade off memory use and\n time. [default: 50; required]\n -s, --spatial_sorting [hilbert|morton|geohash|none]\n Spatial sorting method when perfoming\n spatial partitioning. [default: none]\n -crs, --cut_crs INTEGER Set the coordinate reference system (CRS)\n used for cutting large geometries (see\n `--cut_threshold`). Defaults to the same CRS\n as the input. Should be a valid EPSG code.\n -c, --cut_threshold INTEGER Cutting up large geometries into smaller\n geometries based on a target length. Units\n are assumed to match the input CRS units\n unless the `--cut_crs` is also given, in\n which case units match the units of the\n supplied CRS. [default: 5000; required]\n -t, --threads INTEGER Amount of threads used for operation\n [default: NUM_CPUS - 1]\n -cp, --compression TEXT Compression method to use for the output\n Parquet files. Options include 'snappy',\n 'gzip', 'brotli', 'lz4', 'zstd', etc. Use\n 'none' for no compression. [default:\n snappy]\n -lyr, --layer TEXT Name of the layer or table to read when\n using an input that supports layers or\n tables\n -g, --geom_col TEXT Column name to use when using a spatial\n database connection as input [default:\n geom]\n --tempdir PATH Temporary data is created during the\n execution of this program. This parameter\n allows you to control where this data will\n be written.\n -co, --compact Compact the H3 cells up to the parent\n resolution. Compaction requires an id_field.\n -o, --overwrite\n --version Show the version and exit.\n --help Show this message and exit.\n```\n\n## Visualising output\n\nOutput is in the Apache Parquet format, a directory with one file per partition.\n\nFor a quick view of your output, you can read Apache Parquet with pandas, and then use tools like h3-pandas and geopandas to convert this into a GeoPackage or GeoParquet for visualisation in a desktop GIS, such as QGIS. The Apache Parquet output is indexed by an ID column (which you can specify), so it should be ready for two intended use-cases:\n- Joining attribute data from the original feature-level data onto computer DGGS cells.\n- Joining other data to this output on the DGGS cell ID. (The output has a column like `{dggs}_\\d`, e.g. `h3_09` or `h3_12` according to the target resolution, zero-padded to account for the maximum resolution of the DGGS)\n\nGeoparquet output (hexagon boundaries):\n\n```python\n>>> import pandas as pd\n>>> import h3pandas\n>>> g = pd.read_parquet('./output-data/nz-property-titles.12.parquet').h3.h3_to_geo_boundary()\n>>> g\n title_no geometry\nh3_12 \n8cbb53a734553ff NA94D/635 POLYGON ((174.28483 -35.69315, 174.28482 -35.6...\n8cbb53a734467ff NA94D/635 POLYGON ((174.28454 -35.69333, 174.28453 -35.6...\n8cbb53a734445ff NA94D/635 POLYGON ((174.28416 -35.69368, 174.28415 -35.6...\n8cbb53a734551ff NA94D/635 POLYGON ((174.28496 -35.69329, 174.28494 -35.6...\n8cbb53a734463ff NA94D/635 POLYGON ((174.28433 -35.69335, 174.28432 -35.6...\n... ... ...\n8cbb53a548b2dff NA62D/324 POLYGON ((174.30249 -35.69369, 174.30248 -35.6...\n8cbb53a548b61ff NA62D/324 POLYGON ((174.30232 -35.69402, 174.30231 -35.6...\n8cbb53a548b11ff NA57C/785 POLYGON ((174.30140 -35.69348, 174.30139 -35.6...\n8cbb53a548b15ff NA57C/785 POLYGON ((174.30161 -35.69346, 174.30160 -35.6...\n8cbb53a548b17ff NA57C/785 POLYGON ((174.30149 -35.69332, 174.30147 -35.6...\n\n[52736 rows x 2 columns]\n>>> g.to_parquet('./output-data/parcels.12.geo.parquet')\n```\n\nAn example for S2 output (using `s2sphere`):\n\n\n```python\nimport pandas as pd\nimport geopandas as gpd\nimport s2sphere\nfrom shapely.geometry import Polygon\n\nRES = 18\ndf = pd.read_parquet(f'~/output-data/ponds-with-holes.s2.{RES}.pq')\n\ndef s2id_to_polygon(s2_id_hex):\n cell_id = s2sphere.CellId.from_token(s2_id_hex)\n cell = s2sphere.Cell(cell_id)\n vertices = []\n for i in range(4):\n vertex = cell.get_vertex(i)\n lat_lng = s2sphere.LatLng.from_point(vertex)\n vertices.append((lat_lng.lng().degrees, lat_lng.lat().degrees)) # (lon, lat)\n return Polygon(vertices)\n\ndf['geometry'] = df.index.to_series().apply(s2id_to_polygon)\ndf = gpd.GeoDataFrame(df, geometry='geometry', crs='EPSG:4326') # WGS84\ndf.to_parquet(f'sample-{RES}.parquet')\n```\n\n## Compaction\n\nCompaction is supported with the `-co/--compact` argument. The result respects overlapping polygons by considering each feature independently. (In the below example output for rHEALPix, cells are shown with opacity; overlap is visible where there is a darker shade.) This does mean that the index of the result is not necessarily unique (unless your input is a vector _coverage_, i.e. it does not have overlaps.)\n\n\n\n### For development\n\nIn brief, to get started:\n\n- Install [Poetry](https://python-poetry.org/docs/basic-usage/)\n- Install [GDAL](https://gdal.org/)\n - If you're on Windows, `pip install gdal` may be necessary before running the subsequent commands.\n - On Linux, install GDAL 3.8+ according to your platform-specific instructions, including development headers, i.e. `libgdal-dev`.\n- Create the virtual environment with `poetry init`. This will install necessary dependencies.\n - If the installation of `s2geometry` fails, you may require SWIG to build it. (A command like `conda install swig` or `sudo dnf install swig` depending on your platform).\n- Subsequently, the virtual environment can be re-activated with `poetry shell`.\n\nIf you run `poetry install`, the CLI tool will be aliased so you can simply use `vector2dggs` rather than `poetry run vector2dggs`, which is the alternative if you do not `poetry install`.\n\nAlternatively, it is also possible to install using pip with `pip install -e .`, and bypass Poetry.\n\n#### Code formatting\n\n[](https://github.com/psf/black)\n\nPlease run `black .` before committing.\n\n## Example commands\n\nWith a local GPKG:\n\n```bash\nvector2dggs h3 -v DEBUG -id title_no -r 12 -o ~/Downloads/nz-property-titles.gpkg ~/Downloads/nz-property-titles.parquet\n\n```\n\nWith a PostgreSQL/PostGIS connection:\n\n```bash\nvector2dggs h3 -v DEBUG -id ogc_fid -r 9 -p 5 -t 4 --overwrite -lyr topo50_lake postgresql://user:password@host:port/db ./topo50_lake.parquet\n```\n\n## Citation\n\n```bibtex\n@software{vector2dggs,\n title={{vector2dggs}},\n author={Ardo, James and Law, Richard},\n url={https://github.com/manaakiwhenua/vector2dggs},\n version={0.10.0},\n date={2023-04-20}\n}\n```\n\nAPA/Harvard\n\n> Ardo, J., & Law, R. (2023). vector2dggs (0.10.0) [Computer software]. https://github.com/manaakiwhenua/vector2dggs\n\n[](https://github.com/manaakiwhenua/manaakiwhenua-standards)\n",
"bugtrack_url": null,
"license": "LGPL-3.0-or-later",
"summary": "CLI DGGS indexer for vector geospatial data",
"version": "0.10.0",
"project_urls": {
"Homepage": "https://github.com/manaakiwhenua/vector2dggs",
"Repository": "https://github.com/manaakiwhenua/vector2dggs"
},
"split_keywords": [
"dggs",
" vector",
" h3",
" rhealpix",
" cli"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f0ff06b4c9c3650bfad9293a54e28d481604fd2693c0208bf2f7506b5a6e3ee6",
"md5": "742a81e6b4f643ed19e3ccab20d14869",
"sha256": "2d4447dbf6b5adb0fed0fad56d3903700d1e36b395462dbf2a35853f6d6dbd92"
},
"downloads": -1,
"filename": "vector2dggs-0.10.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "742a81e6b4f643ed19e3ccab20d14869",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.11",
"size": 42962,
"upload_time": "2025-08-14T02:25:55",
"upload_time_iso_8601": "2025-08-14T02:25:55.792159Z",
"url": "https://files.pythonhosted.org/packages/f0/ff/06b4c9c3650bfad9293a54e28d481604fd2693c0208bf2f7506b5a6e3ee6/vector2dggs-0.10.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e7f977da189d44c94418d2688c1c63528ce17a2fc80d15a03fc48ba7886b885b",
"md5": "d6003e27641b5107d187c0f8bf0a48f4",
"sha256": "7230d2dc23dcab6d25f7850a1ea920680edc750dba2a5851874a1e8401c5f3bf"
},
"downloads": -1,
"filename": "vector2dggs-0.10.0.tar.gz",
"has_sig": false,
"md5_digest": "d6003e27641b5107d187c0f8bf0a48f4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.11",
"size": 35637,
"upload_time": "2025-08-14T02:25:57",
"upload_time_iso_8601": "2025-08-14T02:25:57.559536Z",
"url": "https://files.pythonhosted.org/packages/e7/f9/77da189d44c94418d2688c1c63528ce17a2fc80d15a03fc48ba7886b885b/vector2dggs-0.10.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-14 02:25:57",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "manaakiwhenua",
"github_project": "vector2dggs",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "vector2dggs"
}