# spatial-kfold
[![License: GPL-3.0](https://img.shields.io/badge/License-GPL--3.0-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![pypi](https://img.shields.io/pypi/v/spatial-kfold.svg)](https://pypi.org/project/spatial-kfold/)
[![Downloads](https://static.pepy.tech/badge/spatial-kfold)](https://pepy.tech/project/spatial-kfold)
spatial-kfold: A Python Package for Spatial Resampling Toward More Reliable Cross-Validation in Spatial Studies.
spatial-kfold is a python library for performing spatial resampling to ensure more robust cross-validation in spatial studies. It offers spatial clustering and block resampling technique with user-friendly parameters to customize the resampling. It enables users to conduct a "Leave Region Out" cross-validation, which can be useful for evaluating the model's generalization to new locations as well as improving the reliability of [feature selection](https://doi.org/10.1016/j.ecolmodel.2019.108815) and [hyperparameter tuning](https://doi.org/10.1016/j.ecolmodel.2019.06.002) in spatial studies.
spatial-kfold can be integrated easily with scikit-learn's [LeaveOneGroupOut](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.LeaveOneGroupOut.html) cross-validation technique. This integration enables you to further leverage the resampled spatial data for performing feature selection and hyperparameter tuning.
# Main Features
spatial-kfold allow conducting "Leave Region Out" using two spatial resampling techniques:
* 1. Spatial clustering with KMeans or BisectingKMeans
* 2. Spatial blocks
* Random blocks
* Continuous blocks
* tb-lr : top-bottom, left-right
* bt-rl : bottom-top, right-left
# Installation
spatial-kfold can be installed from [PyPI](https://pypi.org/project/spatial-kfold/)
```
pip install spatial-kfold
```
# Example
## 1. Spatial clustering with KMeans [![View Jupyter Notebook](https://img.shields.io/badge/view-Jupyter%20notebook-lightgrey.svg)](https://github.com/WalidGharianiEAGLE/spatial-kfold/blob/main/notebooks/spatialkfold_intro.ipynb)
```python
import geopandas as gpd
import matplotlib.pyplot as plt
from matplotlib import cm
import matplotlib.colors as colors
from matplotlib.colors import ListedColormap, LinearSegmentedColormap
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
from spatialkfold.blocks import spatial_blocks
from spatialkfold.datasets import load_ames
from spatialkfold.clusters import spatial_kfold_clusters
# Load ames data
ames = load_ames()
ames_prj = ames.copy().to_crs(ames.estimate_utm_crs())
ames_prj['id'] = range(len(ames_prj))
# 1. Spatial cluster resampling
ames_clusters = spatial_kfold_clusters (gdf=ames_prj, name='id', nfolds=10, algorithm='kmeans', random_state=569)
# Get the 'tab20' colormap
cols_tab = cm.get_cmap('tab20', 10)
# Generate a list of colors from the colormap
cols = [cols_tab(i) for i in range(10)]
# create a color ramp
color_ramp = ListedColormap(cols)
fig, ax = plt.subplots(1,1 , figsize=(9, 4))
ames_clusters.plot(column='folds', ax=ax, cmap= color_ramp, markersize = 2, legend=True)
ax.set_title('Spatially Clustered Folds')
plt.show()
```
<p align="center">
<img src="https://github.com/WalidGharianiEAGLE/spatial-kfold/blob/main/images/clusters_resampling.png?raw=true" width="400" />
</p>
## 2. Spatial blocks [![View Jupyter Notebook](https://img.shields.io/badge/view-Jupyter%20notebook-lightgrey.svg)](https://github.com/WalidGharianiEAGLE/spatial-kfold/blob/main/notebooks/spatialkfold_intro.ipynb)
```python
# 2.1 spatial resampled random blocks
# create 10 random blocks
ames_rnd_blocks = spatial_blocks(gdf=ames_prj, width=1500, height=1500,
method='random', nfolds=10,
random_state=135)
# resample the ames data with the prepared blocks
ames_res_rnd_blk = gpd.overlay (ames_prj, ames_rnd_blocks)
# plot the resampled blocks
fig, ax = plt.subplots(1,2 , figsize=(10, 6))
# plot 1
ames_rnd_blocks.plot(column='folds',cmap=color_ramp, ax=ax[0] ,lw=0.7, legend=False)
ames_prj.plot(ax=ax[0], markersize = 1, color = 'r')
ax[0].set_title('Random Blocks Folds')
# plot 2
ames_rnd_blocks.plot(facecolor="none",edgecolor='grey', ax=ax[1] ,lw=0.7, legend=False)
ames_res_rnd_blk.plot(column='folds', cmap=color_ramp, legend=False, ax=ax[1], markersize=3)
ax[1].set_title('Spatially Resampled\nrandom blocks')
im1 = ax[1].scatter(ames_res_rnd_blk.geometry.x , ames_res_rnd_blk.geometry.y, c=ames_res_rnd_blk['folds'], cmap=color_ramp, s=5)
axins1 = inset_axes(
ax[1],
width="5%", # width: 5% of parent_bbox width
height="50%", # height: 50%
loc="lower left",
bbox_to_anchor=(1.05, 0, 1, 2),
bbox_transform=ax[1].transAxes,
borderpad=0
)
fig.colorbar(im1, cax=axins1, ticks= range(1,11))
plt.show()
```
<p align="center">
<img src="https://github.com/WalidGharianiEAGLE/spatial-kfold/blob/main/images/blocks_resampling.png?raw=true" width="700" />
</p>
## 3. Compare Random and Spatial cross validation [![View Jupyter Notebook](https://img.shields.io/badge/view-Jupyter%20notebook-lightgrey.svg)](https://github.com/WalidGharianiEAGLE/spatial-kfold/blob/main/notebooks/spatialkfold_intro.ipynb)
<p align="center">
<img src="https://github.com/WalidGharianiEAGLE/spatial-kfold/blob/main/images/randomCV_spatialCV.png?raw=true" width="800" />
</p>
# Credits
This package was inspired by the following R packages:
* [CAST](https://github.com/HannaMeyer/CAST/)
* [spatialsample](https://github.com/tidymodels/spatialsample/)
# Dependencies
This project relies on the following dependencies:
* [pandas](https://pandas.pydata.org)
* [numpy](https://numpy.org)
* [geopandas](https://geopandas.org)
* [shapely](https://shapely.readthedocs.io)
* [matplotlib](https://matplotlib.org)
* [scikit-learn](https://scikit-learn.org)
# Citation
If you use My Package in your research or work, please cite it using the following entries:
- MLA Style:
```
Ghariani, Walid. "spatial-kfold: A Python Package for Spatial Resampling Toward More Reliable Cross-Validation in Spatial Studies." 2023. GitHub, https://github.com/WalidGharianiEAGLE/spatial-kfold
```
- BibTex Style:
```
@Misc{spatial-kfold,
author = {Walid Ghariani},
title = {spatial-kfold: A Python Package for Spatial Resampling Toward More Reliable Cross-Validation in Spatial Studies},
howpublished = {GitHub},
year = {2023},
url = {https://github.com/WalidGharianiEAGLE/spatial-kfold}
}
```
# Resources
A list of tutorials and resources mainly in R explaining the importance of spatial resampling and spatial cross validation
* [Hanna Meyer: "Machine-learning based modelling of spatial and spatio-temporal data"](https://www.youtube.com/watch?v=QGjdS1igq78&t=1271s)
* [Jannes Münchow: "The importance of spatial cross-validation in predictive modeling"](https://www.youtube.com/watch?v=1rSoiSb7xbw&t=649s)
* [Julia Silge: Spatial resampling for more reliable model evaluation with geographic data ](https://www.youtube.com/watch?v=wVrcw_ek3a4&t=904s)
# Bibliography
Meyer, H., Reudenbach, C., Wöllauer, S., Nauss, T. (2019): Importance of spatial predictor variable selection in machine learning applications - Moving from data reproduction to spatial prediction. Ecological Modelling. 411. https://doi.org/10.1016/j.ecolmodel.2019.108815
Schratz, Patrick, et al. "Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data." Ecological Modelling 406 (2019): 109-120. https://doi.org/10.1016/j.ecolmodel.2019.06.002
Schratz, Patrick, et al. "mlr3spatiotempcv: Spatiotemporal resampling methods for machine learning in R." arXiv preprint arXiv:2110.12674 (2021). https://arxiv.org/abs/2110.12674
Valavi, Roozbeh, et al. "blockCV: An r package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models." Biorxiv (2018): 357798. https://doi.org/10.1101/357798
Raw data
{
"_id": null,
"home_page": "https://github.com/WalidGharianiEAGLE/spatial-kfold",
"name": "spatial-kfold",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "cross-validation,machine-learning,GIS,spatial",
"author": "Walid Ghariani",
"author_email": "walid11ghariani@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/d6/77/fced7626cbf6301ccc248c495aaa6886d79eecef385088bf90f1c9312aaa/spatial-kfold-0.0.3.tar.gz",
"platform": null,
"description": "# spatial-kfold\n[![License: GPL-3.0](https://img.shields.io/badge/License-GPL--3.0-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)\n[![pypi](https://img.shields.io/pypi/v/spatial-kfold.svg)](https://pypi.org/project/spatial-kfold/)\n[![Downloads](https://static.pepy.tech/badge/spatial-kfold)](https://pepy.tech/project/spatial-kfold)\n\nspatial-kfold: A Python Package for Spatial Resampling Toward More Reliable Cross-Validation in Spatial Studies.\n\nspatial-kfold is a python library for performing spatial resampling to ensure more robust cross-validation in spatial studies. It offers spatial clustering and block resampling technique with user-friendly parameters to customize the resampling. It enables users to conduct a \"Leave Region Out\" cross-validation, which can be useful for evaluating the model's generalization to new locations as well as improving the reliability of [feature selection](https://doi.org/10.1016/j.ecolmodel.2019.108815) and [hyperparameter tuning](https://doi.org/10.1016/j.ecolmodel.2019.06.002) in spatial studies.\n\n\nspatial-kfold can be integrated easily with scikit-learn's [LeaveOneGroupOut](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.LeaveOneGroupOut.html) cross-validation technique. This integration enables you to further leverage the resampled spatial data for performing feature selection and hyperparameter tuning.\n\n# Main Features\n\nspatial-kfold allow conducting \"Leave Region Out\" using two spatial resampling techniques:\n\n* 1. Spatial clustering with KMeans or BisectingKMeans\n* 2. Spatial blocks\n * Random blocks\n * Continuous blocks \n * tb-lr : top-bottom, left-right\n * bt-rl : bottom-top, right-left\n\n# Installation\n\nspatial-kfold can be installed from [PyPI](https://pypi.org/project/spatial-kfold/)\n\n```\npip install spatial-kfold\n```\n\n# Example \n\n## 1. Spatial clustering with KMeans [![View Jupyter Notebook](https://img.shields.io/badge/view-Jupyter%20notebook-lightgrey.svg)](https://github.com/WalidGharianiEAGLE/spatial-kfold/blob/main/notebooks/spatialkfold_intro.ipynb)\n\n```python\nimport geopandas as gpd\nimport matplotlib.pyplot as plt\nfrom matplotlib import cm\nimport matplotlib.colors as colors\nfrom matplotlib.colors import ListedColormap, LinearSegmentedColormap\nfrom mpl_toolkits.axes_grid1.inset_locator import inset_axes\n\nfrom spatialkfold.blocks import spatial_blocks \nfrom spatialkfold.datasets import load_ames\nfrom spatialkfold.clusters import spatial_kfold_clusters \n\n# Load ames data\names = load_ames()\names_prj = ames.copy().to_crs(ames.estimate_utm_crs())\names_prj['id'] = range(len(ames_prj))\n\n# 1. Spatial cluster resampling \names_clusters = spatial_kfold_clusters (gdf=ames_prj, name='id', nfolds=10, algorithm='kmeans', random_state=569) \n\n# Get the 'tab20' colormap\ncols_tab = cm.get_cmap('tab20', 10)\n# Generate a list of colors from the colormap\ncols = [cols_tab(i) for i in range(10)]\n# create a color ramp\ncolor_ramp = ListedColormap(cols)\n\n\nfig, ax = plt.subplots(1,1 , figsize=(9, 4)) \names_clusters.plot(column='folds', ax=ax, cmap= color_ramp, markersize = 2, legend=True)\nax.set_title('Spatially Clustered Folds')\nplt.show()\n```\n\n<p align=\"center\">\n <img src=\"https://github.com/WalidGharianiEAGLE/spatial-kfold/blob/main/images/clusters_resampling.png?raw=true\" width=\"400\" />\n</p>\n\n## 2. Spatial blocks [![View Jupyter Notebook](https://img.shields.io/badge/view-Jupyter%20notebook-lightgrey.svg)](https://github.com/WalidGharianiEAGLE/spatial-kfold/blob/main/notebooks/spatialkfold_intro.ipynb)\n\n```python\n\n# 2.1 spatial resampled random blocks \n\n# create 10 random blocks \names_rnd_blocks = spatial_blocks(gdf=ames_prj, width=1500, height=1500, \n method='random', nfolds=10, \n random_state=135)\n\n# resample the ames data with the prepared blocks \names_res_rnd_blk = gpd.overlay (ames_prj, ames_rnd_blocks)\n\n# plot the resampled blocks\nfig, ax = plt.subplots(1,2 , figsize=(10, 6)) \n\n# plot 1\names_rnd_blocks.plot(column='folds',cmap=color_ramp, ax=ax[0] ,lw=0.7, legend=False)\names_prj.plot(ax=ax[0], markersize = 1, color = 'r')\nax[0].set_title('Random Blocks Folds')\n\n# plot 2\names_rnd_blocks.plot(facecolor=\"none\",edgecolor='grey', ax=ax[1] ,lw=0.7, legend=False)\names_res_rnd_blk.plot(column='folds', cmap=color_ramp, legend=False, ax=ax[1], markersize=3)\nax[1].set_title('Spatially Resampled\\nrandom blocks')\n\n\nim1 = ax[1].scatter(ames_res_rnd_blk.geometry.x , ames_res_rnd_blk.geometry.y, c=ames_res_rnd_blk['folds'], cmap=color_ramp, s=5)\n\naxins1 = inset_axes(\n ax[1],\n width=\"5%\", # width: 5% of parent_bbox width\n height=\"50%\", # height: 50%\n loc=\"lower left\",\n bbox_to_anchor=(1.05, 0, 1, 2),\n bbox_transform=ax[1].transAxes,\n borderpad=0\n)\nfig.colorbar(im1, cax=axins1, ticks= range(1,11))\n\nplt.show()\n```\n\n<p align=\"center\">\n <img src=\"https://github.com/WalidGharianiEAGLE/spatial-kfold/blob/main/images/blocks_resampling.png?raw=true\" width=\"700\" />\n</p>\n\n## 3. Compare Random and Spatial cross validation [![View Jupyter Notebook](https://img.shields.io/badge/view-Jupyter%20notebook-lightgrey.svg)](https://github.com/WalidGharianiEAGLE/spatial-kfold/blob/main/notebooks/spatialkfold_intro.ipynb)\n\n<p align=\"center\">\n <img src=\"https://github.com/WalidGharianiEAGLE/spatial-kfold/blob/main/images/randomCV_spatialCV.png?raw=true\" width=\"800\" />\n</p>\n\n# Credits\n\nThis package was inspired by the following R packages:\n\n* [CAST](https://github.com/HannaMeyer/CAST/)\n* [spatialsample](https://github.com/tidymodels/spatialsample/) \n\n# Dependencies\n\nThis project relies on the following dependencies:\n* [pandas](https://pandas.pydata.org)\n* [numpy](https://numpy.org)\n* [geopandas](https://geopandas.org)\n* [shapely](https://shapely.readthedocs.io)\n* [matplotlib](https://matplotlib.org)\n* [scikit-learn](https://scikit-learn.org)\n\n\n# Citation\n\nIf you use My Package in your research or work, please cite it using the following entries:\n\n- MLA Style:\n\n```\nGhariani, Walid. \"spatial-kfold: A Python Package for Spatial Resampling Toward More Reliable Cross-Validation in Spatial Studies.\" 2023. GitHub, https://github.com/WalidGharianiEAGLE/spatial-kfold\n```\n- BibTex Style:\n\n```\n@Misc{spatial-kfold,\nauthor = {Walid Ghariani},\ntitle = {spatial-kfold: A Python Package for Spatial Resampling Toward More Reliable Cross-Validation in Spatial Studies},\nhowpublished = {GitHub},\nyear = {2023},\nurl = {https://github.com/WalidGharianiEAGLE/spatial-kfold}\n}\n```\n# Resources\n\nA list of tutorials and resources mainly in R explaining the importance of spatial resampling and spatial cross validation\n\n* [Hanna Meyer: \"Machine-learning based modelling of spatial and spatio-temporal data\"](https://www.youtube.com/watch?v=QGjdS1igq78&t=1271s)\n* [Jannes M\u00fcnchow: \"The importance of spatial cross-validation in predictive modeling\"](https://www.youtube.com/watch?v=1rSoiSb7xbw&t=649s)\n* [Julia Silge: Spatial resampling for more reliable model evaluation with geographic data ](https://www.youtube.com/watch?v=wVrcw_ek3a4&t=904s)\n\n# Bibliography\n\nMeyer, H., Reudenbach, C., W\u00f6llauer, S., Nauss, T. (2019): Importance of spatial predictor variable selection in machine learning applications - Moving from data reproduction to spatial prediction. Ecological Modelling. 411. https://doi.org/10.1016/j.ecolmodel.2019.108815\n\nSchratz, Patrick, et al. \"Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data.\" Ecological Modelling 406 (2019): 109-120. https://doi.org/10.1016/j.ecolmodel.2019.06.002\n\nSchratz, Patrick, et al. \"mlr3spatiotempcv: Spatiotemporal resampling methods for machine learning in R.\" arXiv preprint arXiv:2110.12674 (2021). https://arxiv.org/abs/2110.12674\n\nValavi, Roozbeh, et al. \"blockCV: An r package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models.\" Biorxiv (2018): 357798. https://doi.org/10.1101/357798 \n",
"bugtrack_url": null,
"license": "GPL-3.0",
"summary": "spatial-kfold: A Python Package for Spatial Resampling Toward More Reliable Cross-Validation in Spatial Studies.",
"version": "0.0.3",
"project_urls": {
"Homepage": "https://github.com/WalidGharianiEAGLE/spatial-kfold"
},
"split_keywords": [
"cross-validation",
"machine-learning",
"gis",
"spatial"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9c9de7a2a6b7f9b6ebfb4879be8df2b20e135feb3c6c3a4eeffd2430364aeff7",
"md5": "54cd45349366a500caf97f5e5ee66604",
"sha256": "2d697fccd79612bb37547662ec34b14e0a0c348321be90432733d3823da5d41b"
},
"downloads": -1,
"filename": "spatial_kfold-0.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "54cd45349366a500caf97f5e5ee66604",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 287537,
"upload_time": "2023-10-26T09:00:45",
"upload_time_iso_8601": "2023-10-26T09:00:45.862259Z",
"url": "https://files.pythonhosted.org/packages/9c/9d/e7a2a6b7f9b6ebfb4879be8df2b20e135feb3c6c3a4eeffd2430364aeff7/spatial_kfold-0.0.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d677fced7626cbf6301ccc248c495aaa6886d79eecef385088bf90f1c9312aaa",
"md5": "93a609540b42ea1c48a11d6fd761fdca",
"sha256": "dbf36d34435cb37cbbd60acbfbe578eefff3ab19b3c344af78c9a59cfa50d70b"
},
"downloads": -1,
"filename": "spatial-kfold-0.0.3.tar.gz",
"has_sig": false,
"md5_digest": "93a609540b42ea1c48a11d6fd761fdca",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 260467,
"upload_time": "2023-10-26T09:00:48",
"upload_time_iso_8601": "2023-10-26T09:00:48.437749Z",
"url": "https://files.pythonhosted.org/packages/d6/77/fced7626cbf6301ccc248c495aaa6886d79eecef385088bf90f1c9312aaa/spatial-kfold-0.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-26 09:00:48",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "WalidGharianiEAGLE",
"github_project": "spatial-kfold",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "spatial-kfold"
}