# FastPyNUTS
A fast implementation of querying the [NUTS - Nomenclature of territorial units for statistics](https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units/nuts) dataset by location, particularly useful for large-scale applications.
![Figure: NUTS levels (Eurostat)](img/levels.gif) <br>
Figure: [_Eurostat_](https://ec.europa.eu/eurostat/documents/7116161/7117206/NUTS-layers.gif)
## Features
- fast querying of NUTS regions (~0.3ms/query)
- find all NUTS regions of a point or query user-defined NUTS-levels (0-3)
- use your own custom NUTS dataset (other CRS, enriched metadata, etc.)
## Installation
```cmd
pip install fastpynuts
```
`FastPyNUTS` requires `numpy`, `shapely`, `treelib` and `rtree`
## Usage
#### Initialization and finding NUTS regions
The `NUTSfinder` class is the main tool to determine the NUTS regions of a point. It can be initialized from a local file
containing the NUTS regions, or via automatic download from [Eurostat](https://gisco-services.ec.europa.eu/distribution/v2/nuts).
```python
from fastpynuts import NUTSfinder
# construct from local file
nf = NUTSfinder("PATH_TO_LOCAL_FILE.geojson")
# retrieve data automatically (file will be downloaded to or if already existing read from '.data')
nf = NUTSfinder.from_web(scale=1, year=2021, epsg=4326)
# find NUTS regions
point = (11.57, 48.13)
regions = nf.find(*point) # find all regions via a point
bbox = (11.57, 48.13, 11.62, 49.) # lon_min, lat_min, lon_max, lat_max
regions = nf.find_bbox() # find all regions via a bbox
geom = {
"type": "Polygon",
"coordinates": [
[
[11.595733032762524, 48.11837184946995],
[11.631858436052113, 48.14289890153063],
[11.627498473585405, 48.16409081247133],
[11.595733032762524, 48.11837184946995]
]
]
}
regions = nf.find_bbox() # find all regions via a GeoJSON geometry (supports shapely geometries and all objects that can be converted into one)
# filter for regions of specific levels
level3 = nf.filter_levels(regions, 3)
level2or3 = nf.filter_levels(regions, 2, 3)
```
#### Assessing the results
The NUTS regions will be returned as an ordered list of `NUTSregion` objects.
```python
>>> regions
[NUTS0: DE, NUTS1: DE2, NUTS2: DE21, NUTS3: DE212]
```
Each region object holds information about
- its ID and NUTS level
```python
>>> region = regions[0]
>>> region.id
DE
>>> region.level
0
```
- its geometry (a `shapely` Polygon or MultiPolygon) and the corresponding bounding box
```python
>>> region.geom
<MULTIPOLYGON (((10.454 47.556, 10.44 47.525, 10.441 47.514, 10.432 47.504, ...>
>>> region.bbox
(5.867697, 47.270114, 15.04116, 55.058165)
```
- further fields from the NUTS dataset and the original input feature in GeoJSON format
```python
>>> region.properties
{
"NUTS_ID": "DE",
"LEVL_CODE": 0,
"CNTR_CODE": "DE",
"NAME_LATN": "Deutschland",
"NUTS_NAME": "Deutschland",
"MOUNT_TYPE": 0,
"URBN_TYPE": 0,
"COAST_TYPE": 0,
"FID": "DE"
}
>>> region.feature
{
'type': 'Feature',
'geometry': {
'type': 'MultiPolygon',
'coordinates': [
[
[
[10.454439, 47.555797],
...
]
]
],
},
'properties': {
"NUTS_ID": "DE",
...
}
```
## Advanced Usage
```python
# apply a buffer to the input regions to catch points on the boundary (for further info on the buffering, see the documentation)
nf = NUTSfinder("PATH_TO_LOCAL_FILE.geojson", buffer_geoms=1e-5)
# only load certain levels of regions (here levels 2 and 3)
nf = NUTSfinder("PATH_TO_LOCAL_FILE.geojson", min_level=2, max_level=3)
# if the point to be queried is guaranteed to lie within a NUTS region, setting valid_point to True may speed up the runtime
regions = nf.find(*point, valid_point=True)
```
## Runtime Comparison
`FastPyNUTS` is optimized for query speed and result correctness, at the expense of more expensive initialization time.
A R-tree-based approach proved to be the fastest option:
<table>
<tr>
<td> <img src="img/benchmark_1.png" alt="Benchmark for scale 1."> </td>
<td> <img src="img/benchmark_20_zoom.png" alt="Benchmark for scale 1."> </td>
</tr>
</table>
Compared to other packages like [nuts-finder](https://github.com/nestauk/nuts_finder), a large performance boost can be achieved
![](img/benchmark_other.png)
**Tips**:
- if interested only in certain levels (0-3) of the NUTS dataset, initialize the `NUTSfinder` using its `min_level` and `max_level` arguments
- if it's known beforehand that the queried point lies within the interior of a NUTS region, use `find(valid_point=True)`
For a full runtime analysis, see [benchmark.ipynb](benchmark.ipynb)
## Contributors
- [Colin Moldenhauer](https://github.com/ColinMoldenhauer/)
- [meengel](https://github.com/meengel)
Raw data
{
"_id": null,
"home_page": null,
"name": "fastpynuts",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "eurostat, NUTS, nomenclature of territorial units for statistics",
"author": null,
"author_email": "Colin Moldenhauer <colin.moldenhauer@tum.de>, Michael Engel <m.engel@tum.de>",
"download_url": "https://files.pythonhosted.org/packages/ba/53/fd6036195c76d1624080e86d06779761073d016263f4646c6a4f890d999c/fastpynuts-1.1.0.tar.gz",
"platform": null,
"description": "# FastPyNUTS\r\nA fast implementation of querying the [NUTS - Nomenclature of territorial units for statistics](https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units/nuts) dataset by location, particularly useful for large-scale applications.\r\n\r\n\r\n![Figure: NUTS levels (Eurostat)](img/levels.gif) <br>\r\nFigure: [_Eurostat_](https://ec.europa.eu/eurostat/documents/7116161/7117206/NUTS-layers.gif)\r\n\r\n\r\n## Features\r\n- fast querying of NUTS regions (~0.3ms/query)\r\n- find all NUTS regions of a point or query user-defined NUTS-levels (0-3)\r\n- use your own custom NUTS dataset (other CRS, enriched metadata, etc.)\r\n\r\n\r\n## Installation\r\n```cmd\r\npip install fastpynuts\r\n```\r\n`FastPyNUTS` requires `numpy`, `shapely`, `treelib` and `rtree`\r\n\r\n\r\n## Usage\r\n\r\n#### Initialization and finding NUTS regions\r\nThe `NUTSfinder` class is the main tool to determine the NUTS regions of a point. It can be initialized from a local file\r\ncontaining the NUTS regions, or via automatic download from [Eurostat](https://gisco-services.ec.europa.eu/distribution/v2/nuts).\r\n```python\r\nfrom fastpynuts import NUTSfinder\r\n\r\n# construct from local file\r\nnf = NUTSfinder(\"PATH_TO_LOCAL_FILE.geojson\")\r\n\r\n# retrieve data automatically (file will be downloaded to or if already existing read from '.data')\r\nnf = NUTSfinder.from_web(scale=1, year=2021, epsg=4326)\r\n\r\n\r\n# find NUTS regions\r\npoint = (11.57, 48.13)\r\nregions = nf.find(*point) # find all regions via a point\r\n\r\nbbox = (11.57, 48.13, 11.62, 49.) # lon_min, lat_min, lon_max, lat_max\r\nregions = nf.find_bbox() # find all regions via a bbox\r\n\r\ngeom = {\r\n \"type\": \"Polygon\",\r\n \"coordinates\": [\r\n [\r\n [11.595733032762524, 48.11837184946995],\r\n [11.631858436052113, 48.14289890153063],\r\n [11.627498473585405, 48.16409081247133],\r\n [11.595733032762524, 48.11837184946995]\r\n ]\r\n ]\r\n}\r\nregions = nf.find_bbox() # find all regions via a GeoJSON geometry (supports shapely geometries and all objects that can be converted into one)\r\n\r\n\r\n# filter for regions of specific levels\r\nlevel3 = nf.filter_levels(regions, 3)\r\nlevel2or3 = nf.filter_levels(regions, 2, 3)\r\n```\r\n\r\n#### Assessing the results\r\nThe NUTS regions will be returned as an ordered list of `NUTSregion` objects.\r\n```python\r\n>>> regions\r\n[NUTS0: DE, NUTS1: DE2, NUTS2: DE21, NUTS3: DE212]\r\n```\r\n\r\nEach region object holds information about\r\n- its ID and NUTS level\r\n```python\r\n>>> region = regions[0]\r\n>>> region.id\r\nDE\r\n>>> region.level\r\n0\r\n```\r\n- its geometry (a `shapely` Polygon or MultiPolygon) and the corresponding bounding box\r\n```python\r\n>>> region.geom\r\n<MULTIPOLYGON (((10.454 47.556, 10.44 47.525, 10.441 47.514, 10.432 47.504, ...>\r\n>>> region.bbox\r\n(5.867697, 47.270114, 15.04116, 55.058165)\r\n```\r\n- further fields from the NUTS dataset and the original input feature in GeoJSON format\r\n```python\r\n>>> region.properties\r\n{\r\n \"NUTS_ID\": \"DE\",\r\n \"LEVL_CODE\": 0,\r\n \"CNTR_CODE\": \"DE\",\r\n \"NAME_LATN\": \"Deutschland\",\r\n \"NUTS_NAME\": \"Deutschland\",\r\n \"MOUNT_TYPE\": 0,\r\n \"URBN_TYPE\": 0,\r\n \"COAST_TYPE\": 0,\r\n \"FID\": \"DE\"\r\n}\r\n>>> region.feature\r\n{\r\n 'type': 'Feature',\r\n 'geometry': {\r\n 'type': 'MultiPolygon',\r\n 'coordinates': [\r\n [\r\n [\r\n [10.454439, 47.555797],\r\n ...\r\n ]\r\n ]\r\n ],\r\n },\r\n 'properties': {\r\n \"NUTS_ID\": \"DE\",\r\n ...\r\n}\r\n```\r\n\r\n## Advanced Usage\r\n```python\r\n# apply a buffer to the input regions to catch points on the boundary (for further info on the buffering, see the documentation)\r\nnf = NUTSfinder(\"PATH_TO_LOCAL_FILE.geojson\", buffer_geoms=1e-5)\r\n\r\n# only load certain levels of regions (here levels 2 and 3)\r\nnf = NUTSfinder(\"PATH_TO_LOCAL_FILE.geojson\", min_level=2, max_level=3)\r\n\r\n\r\n# if the point to be queried is guaranteed to lie within a NUTS region, setting valid_point to True may speed up the runtime\r\nregions = nf.find(*point, valid_point=True)\r\n```\r\n\r\n\r\n## Runtime Comparison\r\n`FastPyNUTS` is optimized for query speed and result correctness, at the expense of more expensive initialization time.\r\n\r\nA R-tree-based approach proved to be the fastest option:\r\n<table>\r\n <tr>\r\n <td> <img src=\"img/benchmark_1.png\" alt=\"Benchmark for scale 1.\"> </td>\r\n <td> <img src=\"img/benchmark_20_zoom.png\" alt=\"Benchmark for scale 1.\"> </td>\r\n </tr>\r\n</table>\r\n\r\nCompared to other packages like [nuts-finder](https://github.com/nestauk/nuts_finder), a large performance boost can be achieved\r\n\r\n![](img/benchmark_other.png)\r\n\r\n**Tips**:\r\n- if interested only in certain levels (0-3) of the NUTS dataset, initialize the `NUTSfinder` using its `min_level` and `max_level` arguments\r\n- if it's known beforehand that the queried point lies within the interior of a NUTS region, use `find(valid_point=True)`\r\n\r\nFor a full runtime analysis, see [benchmark.ipynb](benchmark.ipynb)\r\n\r\n\r\n\r\n## Contributors\r\n- [Colin Moldenhauer](https://github.com/ColinMoldenhauer/)\r\n- [meengel](https://github.com/meengel)\r\n",
"bugtrack_url": null,
"license": null,
"summary": "A fast implementation of querying for NUTS regions by location.",
"version": "1.1.0",
"project_urls": {
"Homepage": "https://github.com/ColinMoldenhauer/FastPyNUTS",
"Issues": "https://github.com/ColinMoldenhauer/FastPyNUTS/issues"
},
"split_keywords": [
"eurostat",
" nuts",
" nomenclature of territorial units for statistics"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ce6d2f24590dcce27061f603af4229608e022e8cf17c71e79bd46384221b4e71",
"md5": "636e854e55357e85b2b74081a937e1e9",
"sha256": "063bbb00dc4d73a17216486c3c37048f7ae9a6befef57e061905e02be564fd3a"
},
"downloads": -1,
"filename": "fastpynuts-1.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "636e854e55357e85b2b74081a937e1e9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 25012,
"upload_time": "2024-06-30T13:43:05",
"upload_time_iso_8601": "2024-06-30T13:43:05.182405Z",
"url": "https://files.pythonhosted.org/packages/ce/6d/2f24590dcce27061f603af4229608e022e8cf17c71e79bd46384221b4e71/fastpynuts-1.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ba53fd6036195c76d1624080e86d06779761073d016263f4646c6a4f890d999c",
"md5": "0e9c522162093bbcae26ce74e12ebc26",
"sha256": "71f543557d27b71e389d9838ea24c864419da7dd11e0c842b047b0312f30f675"
},
"downloads": -1,
"filename": "fastpynuts-1.1.0.tar.gz",
"has_sig": false,
"md5_digest": "0e9c522162093bbcae26ce74e12ebc26",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 28200,
"upload_time": "2024-06-30T13:43:07",
"upload_time_iso_8601": "2024-06-30T13:43:07.048627Z",
"url": "https://files.pythonhosted.org/packages/ba/53/fd6036195c76d1624080e86d06779761073d016263f4646c6a4f890d999c/fastpynuts-1.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-06-30 13:43:07",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ColinMoldenhauer",
"github_project": "FastPyNUTS",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "numpy",
"specs": []
},
{
"name": "shapely",
"specs": []
},
{
"name": "rtree",
"specs": []
},
{
"name": "treelib",
"specs": []
}
],
"lcname": "fastpynuts"
}