Name | h3spark JSON |
Version |
0.1.2
JSON |
| download |
home_page | None |
Summary | Lightweight pyspark wrapper for h3-py |
upload_time | 2025-01-17 16:49:24 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.9 |
license | MIT License |
keywords |
h3
h3-py
pyspark
spark
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# h3spark
![Tile the world in hexes](images/big_geo.jpeg "Tile the world in hexes")
`h3spark` is a Python library that provides a set of native and user-defined functions (UDFs) for working with H3 geospatial indexing in PySpark. The functions in this library follow the same assumptions and rules as the native H3 functions, allowing for seamless integration and usage in PySpark data pipelines.
It also provides native implementations of some H3 functions that are more performant in PySpark than using UDFs. These functions are reimplemented in PySpark and avoid the serialization/deserialization overhead of a UDF.
## Installation
You can install `h3spark` using either pip or conda.
### Using pip
```bash
pip install h3spark
```
### Using conda
```bash
conda install -c conda-forge h3spark
```
## Usage
Below is a brief overview of the available functions in `h3spark`. These functions are designed to work with PySpark DataFrames and provide H3 functionality within a distributed data processing environment.
Some of the functions have been reimplemented in pyspark for performance reasons. These functions can be imported from `h3spark.native`. The rest of the functions are wrappers around the native H3 functions and can be imported from `h3spark`. Note that the native functions strive to be as close to the original H3 functions as possible, but there may be some differences in behavior around edge cases + less validation.
### Functions
`H3CellInput` is a type alias that represents an H3 cell, which can be either a hexadecimal string or a long integer (`H3CellInput = Union[str, int]`). h3spark will handle conversion from between types if required by h3. Prefer long integers if possible for more efficient processing.
- **`str_to_int(h3_str: string) -> long`**: Converts an H3 string to an integer.
- **`int_to_str(h3_int: Union[str, int]) -> string`**: Converts an H3 integer to a string. Allows strings due to Spark's limitation with unsigned 64-bit integers.
- **`get_num_cells(res: int) -> int`**: Returns the number of H3 cells at a given resolution.
- **`average_hexagon_area(res: int, unit: Union[AreaUnit, str] = AreaUnit.KM2) -> float`**: Calculates the average area of an H3 hexagon at a given resolution and unit.
- **`average_hexagon_edge_length(res: int, unit: Union[LengthUnit, str] = LengthUnit.KM) -> float`**: Computes the average edge length of an H3 hexagon at a specified resolution and unit.
- **`latlng_to_cell(lat: float, lng: float, res: int) -> long`**: Converts latitude and longitude to an H3 cell at a specified resolution.
- **`cell_to_latlng(cell: H3CellInput) -> COORDINATE_TYPE`**: Converts an H3 cell to its central latitude and longitude.
- **`get_resolution(cell: H3CellInput) -> short`**: Retrieves the resolution of a given H3 cell. _Has a pyspark native equivalent_
- **`cell_to_parent(cell: H3CellInput, res: int) -> long`**: Converts an H3 cell to its parent cell at a specified resolution.
- **`grid_distance(cell1: H3CellInput, cell2: H3CellInput) -> int`**: Calculates the distance in grid cells between two H3 cells. _Has a pyspark native equivalent if the cell's resolution and parent are literals_
- **`cell_to_boundary(cell: H3CellInput) -> BOUNDARY_TYPE`**: Returns the boundary of an H3 cell as a list of coordinates.
- **`grid_disk(cell: H3CellInput, k: int) -> List[long]`**: Returns all cells within k rings around the given H3 cell.
- **`grid_ring(cell: H3CellInput, k: int) -> List[long]`**: Returns cells in a ring of k distance from the given H3 cell.
- **`cell_to_children_size(cell: H3CellInput, res: int) -> int`**: Returns the number of children cells for a given cell at a specified resolution. _Has a pyspark native equivalent_
- **`cell_to_children(cell: H3CellInput, res: int) -> List[long]`**: Returns the children of an H3 cell at a specified resolution.
- **`cell_to_child_pos(child: H3CellInput, res_parent: int) -> int`**: Finds the position of a child cell relative to its parent cell at a specified resolution.
- **`child_pos_to_cell(parent: H3CellInput, res_child: int, child_pos: int) -> long`**: Converts a child position back to an H3 cell.
- **`compact_cells(cells: List[H3CellInput]) -> List[long]`**: Compacts a list of H3 cells.
- **`uncompact_cells(cells: List[H3CellInput], res: int) -> List[long]`**: Uncompacts a list of H3 cells to a specified resolution.
- **`h3shape_to_cells(shape: H3Shape, res: int) -> List[long]`**: Converts a shape to H3 cells at a specified resolution.
- **`cells_to_h3shape(cells: List[H3CellInput]) -> string`**: Converts a list of H3 cells to a GeoJSON shape.
- **`is_pentagon(cell: H3CellInput) -> bool`**: Checks if an H3 cell is a pentagon. _Has a pyspark native equivalent_
- **`get_base_cell_number(cell: H3CellInput) -> int`**: Retrieves the base cell number of an H3 cell. _Has a pyspark native equivalent_
- **`are_neighbor_cells(cell1: H3CellInput, cell2: H3CellInput) -> bool`**: Checks if two H3 cells are neighbors.
- **`grid_path_cells(start: H3CellInput, end: H3CellInput) -> List[long]`**: Finds the grid path between two H3 cells.
- **`is_res_class_III(cell: H3CellInput) -> bool`**: Checks if an H3 cell is of class III resolution.
- **`get_pentagons(res: int) -> List[long]`**: Returns all pentagon cells at a given resolution.
- **`get_res0_cells() -> List[long]`**: Returns all resolution 0 base cells.
- **`cell_to_center_child(cell: H3CellInput, res: int) -> long`**: Finds the center child cell of a given cell at a specified resolution.
- **`get_icosahedron_faces(cell: H3CellInput) -> List[int]`**: Retrieves icosahedron face indexes for a given H3 cell.
- **`cell_to_local_ij(cell: H3CellInput) -> List[int]`**: Converts an H3 cell to local IJ coordinates.
- **`local_ij_to_cell(origin: H3CellInput, i: int, j: int) -> long`**: Converts local IJ coordinates back to an H3 cell.
- **`cell_area(cell: H3CellInput, unit: Union[AreaUnit, str] = AreaUnit.KM2) -> float`**: Computes the area of an H3 cell in a specified unit.
### Spark native Functions
Some H3 functions can ~mostly be reimplemented purely within pyspark. Doing so avoids the serialization/deserialization overhead of a UDF. These functions should be mostly equivalent to their C native counterparts while being more performant in pyspark. You can import them from `h3spark.native`
- **`get_resolution(cell: long) -> long`**: Retrieves the resolution of a given H3 cell.
- **`cell_to_parent_fixed(cell: long, current_resolution: int, parent_resolution: int) -> long`**: Given a column where every row has the same resolution (current_resolution), call `cell_to_parent` on every row to the same constant resolution (parent_resolution). Does not perform any validation on the input cells
- **`get_base_cell(cell: long) -> long`**: Retrieves the base cell number of an H3 cell.
- **`is_pentagon(cell: long) -> bool`**: Checks if an H3 cell is a pentagon.
- **`cell_to_children_size(cell: long, res: int, validate_resolution: Optional[bool] = False) -> int`**: Returns the number of children cells for a given cell at a specified resolution. If `validate_resolution` is set to True, it will throw an error if the resolution of the input cell is less than the requested child resolution.
### Convenience functions
We provide some functions that wrap other h3 functions for streamlining commonly used operations. You can import them from `h3spark.convenience`
- **`min_child(cell: H3CellInput, resolution: int) -> long`**: Finds the child of minimum value of the input H3 cell at the specified resolution
- **`max_child(cell: H3CellInput, resolution: int) -> long`**: Finds the child of maximum value of the input H3 cell at the specified resolution
## License
This library is released under the MIT License. See the [LICENSE](LICENSE) file for more details.
## Contributing
Contributions are welcome! Please submit a pull request or open an issue to contribute to the project.
## Acknowledgments
This library is built on top of the H3 geospatial indexing library and PySpark. Special thanks to the developers of these libraries for their contributions to the open-source community.
For more information, check the [official H3 documentation](https://h3geo.org/docs/) and [PySpark documentation](https://spark.apache.org/docs/latest/api/python/index.html).
## Building + Deploying
```sh
python -m build
python -m twine upload --verbose --repository pypi dist/*
```
Raw data
{
"_id": null,
"home_page": null,
"name": "h3spark",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "h3, h3-py, pyspark, spark",
"author": null,
"author_email": "Joseph Chotard <h3spark@chotard.com>",
"download_url": "https://files.pythonhosted.org/packages/68/c6/3bd7f33db1e5adc4b601d9744221754f20f206407063e0a745127feaccad/h3spark-0.1.2.tar.gz",
"platform": null,
"description": "# h3spark\n\n![Tile the world in hexes](images/big_geo.jpeg \"Tile the world in hexes\")\n\n`h3spark` is a Python library that provides a set of native and user-defined functions (UDFs) for working with H3 geospatial indexing in PySpark. The functions in this library follow the same assumptions and rules as the native H3 functions, allowing for seamless integration and usage in PySpark data pipelines.\n\nIt also provides native implementations of some H3 functions that are more performant in PySpark than using UDFs. These functions are reimplemented in PySpark and avoid the serialization/deserialization overhead of a UDF.\n\n## Installation\n\nYou can install `h3spark` using either pip or conda.\n\n### Using pip\n```bash\npip install h3spark\n```\n\n### Using conda\n```bash\nconda install -c conda-forge h3spark\n```\n\n## Usage\n\nBelow is a brief overview of the available functions in `h3spark`. These functions are designed to work with PySpark DataFrames and provide H3 functionality within a distributed data processing environment.\n\nSome of the functions have been reimplemented in pyspark for performance reasons. These functions can be imported from `h3spark.native`. The rest of the functions are wrappers around the native H3 functions and can be imported from `h3spark`. Note that the native functions strive to be as close to the original H3 functions as possible, but there may be some differences in behavior around edge cases + less validation.\n\n### Functions\n\n`H3CellInput` is a type alias that represents an H3 cell, which can be either a hexadecimal string or a long integer (`H3CellInput = Union[str, int]`). h3spark will handle conversion from between types if required by h3. Prefer long integers if possible for more efficient processing.\n\n- **`str_to_int(h3_str: string) -> long`**: Converts an H3 string to an integer.\n- **`int_to_str(h3_int: Union[str, int]) -> string`**: Converts an H3 integer to a string. Allows strings due to Spark's limitation with unsigned 64-bit integers.\n- **`get_num_cells(res: int) -> int`**: Returns the number of H3 cells at a given resolution.\n- **`average_hexagon_area(res: int, unit: Union[AreaUnit, str] = AreaUnit.KM2) -> float`**: Calculates the average area of an H3 hexagon at a given resolution and unit.\n- **`average_hexagon_edge_length(res: int, unit: Union[LengthUnit, str] = LengthUnit.KM) -> float`**: Computes the average edge length of an H3 hexagon at a specified resolution and unit.\n- **`latlng_to_cell(lat: float, lng: float, res: int) -> long`**: Converts latitude and longitude to an H3 cell at a specified resolution.\n- **`cell_to_latlng(cell: H3CellInput) -> COORDINATE_TYPE`**: Converts an H3 cell to its central latitude and longitude.\n- **`get_resolution(cell: H3CellInput) -> short`**: Retrieves the resolution of a given H3 cell. _Has a pyspark native equivalent_\n- **`cell_to_parent(cell: H3CellInput, res: int) -> long`**: Converts an H3 cell to its parent cell at a specified resolution.\n- **`grid_distance(cell1: H3CellInput, cell2: H3CellInput) -> int`**: Calculates the distance in grid cells between two H3 cells. _Has a pyspark native equivalent if the cell's resolution and parent are literals_\n- **`cell_to_boundary(cell: H3CellInput) -> BOUNDARY_TYPE`**: Returns the boundary of an H3 cell as a list of coordinates.\n- **`grid_disk(cell: H3CellInput, k: int) -> List[long]`**: Returns all cells within k rings around the given H3 cell.\n- **`grid_ring(cell: H3CellInput, k: int) -> List[long]`**: Returns cells in a ring of k distance from the given H3 cell.\n- **`cell_to_children_size(cell: H3CellInput, res: int) -> int`**: Returns the number of children cells for a given cell at a specified resolution. _Has a pyspark native equivalent_\n- **`cell_to_children(cell: H3CellInput, res: int) -> List[long]`**: Returns the children of an H3 cell at a specified resolution.\n- **`cell_to_child_pos(child: H3CellInput, res_parent: int) -> int`**: Finds the position of a child cell relative to its parent cell at a specified resolution.\n- **`child_pos_to_cell(parent: H3CellInput, res_child: int, child_pos: int) -> long`**: Converts a child position back to an H3 cell.\n- **`compact_cells(cells: List[H3CellInput]) -> List[long]`**: Compacts a list of H3 cells.\n- **`uncompact_cells(cells: List[H3CellInput], res: int) -> List[long]`**: Uncompacts a list of H3 cells to a specified resolution.\n- **`h3shape_to_cells(shape: H3Shape, res: int) -> List[long]`**: Converts a shape to H3 cells at a specified resolution.\n- **`cells_to_h3shape(cells: List[H3CellInput]) -> string`**: Converts a list of H3 cells to a GeoJSON shape.\n- **`is_pentagon(cell: H3CellInput) -> bool`**: Checks if an H3 cell is a pentagon. _Has a pyspark native equivalent_\n- **`get_base_cell_number(cell: H3CellInput) -> int`**: Retrieves the base cell number of an H3 cell. _Has a pyspark native equivalent_\n- **`are_neighbor_cells(cell1: H3CellInput, cell2: H3CellInput) -> bool`**: Checks if two H3 cells are neighbors.\n- **`grid_path_cells(start: H3CellInput, end: H3CellInput) -> List[long]`**: Finds the grid path between two H3 cells.\n- **`is_res_class_III(cell: H3CellInput) -> bool`**: Checks if an H3 cell is of class III resolution.\n- **`get_pentagons(res: int) -> List[long]`**: Returns all pentagon cells at a given resolution.\n- **`get_res0_cells() -> List[long]`**: Returns all resolution 0 base cells.\n- **`cell_to_center_child(cell: H3CellInput, res: int) -> long`**: Finds the center child cell of a given cell at a specified resolution.\n- **`get_icosahedron_faces(cell: H3CellInput) -> List[int]`**: Retrieves icosahedron face indexes for a given H3 cell.\n- **`cell_to_local_ij(cell: H3CellInput) -> List[int]`**: Converts an H3 cell to local IJ coordinates.\n- **`local_ij_to_cell(origin: H3CellInput, i: int, j: int) -> long`**: Converts local IJ coordinates back to an H3 cell.\n- **`cell_area(cell: H3CellInput, unit: Union[AreaUnit, str] = AreaUnit.KM2) -> float`**: Computes the area of an H3 cell in a specified unit.\n\n\n### Spark native Functions\n\nSome H3 functions can ~mostly be reimplemented purely within pyspark. Doing so avoids the serialization/deserialization overhead of a UDF. These functions should be mostly equivalent to their C native counterparts while being more performant in pyspark. You can import them from `h3spark.native`\n\n- **`get_resolution(cell: long) -> long`**: Retrieves the resolution of a given H3 cell.\n- **`cell_to_parent_fixed(cell: long, current_resolution: int, parent_resolution: int) -> long`**: Given a column where every row has the same resolution (current_resolution), call `cell_to_parent` on every row to the same constant resolution (parent_resolution). Does not perform any validation on the input cells\n- **`get_base_cell(cell: long) -> long`**: Retrieves the base cell number of an H3 cell.\n- **`is_pentagon(cell: long) -> bool`**: Checks if an H3 cell is a pentagon.\n- **`cell_to_children_size(cell: long, res: int, validate_resolution: Optional[bool] = False) -> int`**: Returns the number of children cells for a given cell at a specified resolution. If `validate_resolution` is set to True, it will throw an error if the resolution of the input cell is less than the requested child resolution.\n\n\n### Convenience functions\n\nWe provide some functions that wrap other h3 functions for streamlining commonly used operations. You can import them from `h3spark.convenience`\n\n- **`min_child(cell: H3CellInput, resolution: int) -> long`**: Finds the child of minimum value of the input H3 cell at the specified resolution\n- **`max_child(cell: H3CellInput, resolution: int) -> long`**: Finds the child of maximum value of the input H3 cell at the specified resolution\n\n## License\n\nThis library is released under the MIT License. See the [LICENSE](LICENSE) file for more details.\n\n## Contributing\n\nContributions are welcome! Please submit a pull request or open an issue to contribute to the project.\n\n## Acknowledgments\n\nThis library is built on top of the H3 geospatial indexing library and PySpark. Special thanks to the developers of these libraries for their contributions to the open-source community.\n\nFor more information, check the [official H3 documentation](https://h3geo.org/docs/) and [PySpark documentation](https://spark.apache.org/docs/latest/api/python/index.html).\n\n## Building + Deploying\n\n```sh\npython -m build\npython -m twine upload --verbose --repository pypi dist/*\n```",
"bugtrack_url": null,
"license": "MIT License",
"summary": "Lightweight pyspark wrapper for h3-py",
"version": "0.1.2",
"project_urls": null,
"split_keywords": [
"h3",
" h3-py",
" pyspark",
" spark"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d556a69037a400a66499d5bd9ff384029d15c7acd803238bc2fd9510e9036ddd",
"md5": "dd7cd66fb85b38a5d521de6aeef6bc65",
"sha256": "4bc06bf24269e22e2a023d2793e5e08d21bbc9d285be2f6c0e349990b614bf04"
},
"downloads": -1,
"filename": "h3spark-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "dd7cd66fb85b38a5d521de6aeef6bc65",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 9124,
"upload_time": "2025-01-17T16:49:23",
"upload_time_iso_8601": "2025-01-17T16:49:23.159803Z",
"url": "https://files.pythonhosted.org/packages/d5/56/a69037a400a66499d5bd9ff384029d15c7acd803238bc2fd9510e9036ddd/h3spark-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "68c63bd7f33db1e5adc4b601d9744221754f20f206407063e0a745127feaccad",
"md5": "4632fcf74f903e014a7db32bbdd9005c",
"sha256": "2e0ea699ad9b43777187c75f92fc1afe617bb8e5eed5a63064850961f6d35342"
},
"downloads": -1,
"filename": "h3spark-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "4632fcf74f903e014a7db32bbdd9005c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 302696,
"upload_time": "2025-01-17T16:49:24",
"upload_time_iso_8601": "2025-01-17T16:49:24.777402Z",
"url": "https://files.pythonhosted.org/packages/68/c6/3bd7f33db1e5adc4b601d9744221754f20f206407063e0a745127feaccad/h3spark-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-17 16:49:24",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "h3spark"
}