spatial-kfold


Namespatial-kfold JSON
Version 0.0.3 PyPI version JSON
download
home_pagehttps://github.com/WalidGharianiEAGLE/spatial-kfold
Summaryspatial-kfold: A Python Package for Spatial Resampling Toward More Reliable Cross-Validation in Spatial Studies.
upload_time2023-10-26 09:00:48
maintainer
docs_urlNone
authorWalid Ghariani
requires_python>=3.7
licenseGPL-3.0
keywords cross-validation machine-learning gis spatial
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # spatial-kfold
[![License: GPL-3.0](https://img.shields.io/badge/License-GPL--3.0-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![pypi](https://img.shields.io/pypi/v/spatial-kfold.svg)](https://pypi.org/project/spatial-kfold/)
[![Downloads](https://static.pepy.tech/badge/spatial-kfold)](https://pepy.tech/project/spatial-kfold)

spatial-kfold: A Python Package for Spatial Resampling Toward More Reliable Cross-Validation in Spatial Studies.

spatial-kfold is a python library for performing spatial resampling to ensure more robust cross-validation in spatial studies. It offers spatial clustering and block resampling technique with user-friendly parameters to customize the resampling. It enables users to conduct a "Leave Region Out" cross-validation, which can be useful for evaluating the model's generalization to new locations as well as improving the reliability of [feature selection](https://doi.org/10.1016/j.ecolmodel.2019.108815) and [hyperparameter tuning](https://doi.org/10.1016/j.ecolmodel.2019.06.002) in spatial studies.


spatial-kfold can be integrated easily with scikit-learn's [LeaveOneGroupOut](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.LeaveOneGroupOut.html) cross-validation technique. This integration enables you to further leverage the resampled spatial data for performing feature selection and hyperparameter tuning.

# Main Features

spatial-kfold allow conducting "Leave Region Out" using two spatial resampling techniques:

* 1. Spatial clustering with KMeans or BisectingKMeans
* 2. Spatial blocks
    * Random blocks
    * Continuous blocks 
        * tb-lr : top-bottom, left-right
        * bt-rl : bottom-top, right-left

# Installation

spatial-kfold can be installed from [PyPI](https://pypi.org/project/spatial-kfold/)

```
pip install spatial-kfold
```

# Example 

## 1. Spatial clustering with KMeans [![View Jupyter Notebook](https://img.shields.io/badge/view-Jupyter%20notebook-lightgrey.svg)](https://github.com/WalidGharianiEAGLE/spatial-kfold/blob/main/notebooks/spatialkfold_intro.ipynb)

```python
import geopandas as gpd
import matplotlib.pyplot as plt
from matplotlib import cm
import matplotlib.colors as colors
from matplotlib.colors import ListedColormap, LinearSegmentedColormap
from mpl_toolkits.axes_grid1.inset_locator import inset_axes

from spatialkfold.blocks import spatial_blocks 
from spatialkfold.datasets import load_ames
from spatialkfold.clusters import spatial_kfold_clusters 

# Load ames data
ames = load_ames()
ames_prj = ames.copy().to_crs(ames.estimate_utm_crs())
ames_prj['id'] = range(len(ames_prj))

# 1. Spatial cluster resampling 
ames_clusters = spatial_kfold_clusters (gdf=ames_prj, name='id', nfolds=10, algorithm='kmeans', random_state=569) 

# Get the 'tab20' colormap
cols_tab = cm.get_cmap('tab20', 10)
# Generate a list of colors from the colormap
cols = [cols_tab(i) for i in range(10)]
# create a color ramp
color_ramp = ListedColormap(cols)


fig, ax = plt.subplots(1,1 , figsize=(9, 4)) 
ames_clusters.plot(column='folds', ax=ax, cmap= color_ramp, markersize = 2, legend=True)
ax.set_title('Spatially Clustered Folds')
plt.show()
```

<p align="center">
  <img src="https://github.com/WalidGharianiEAGLE/spatial-kfold/blob/main/images/clusters_resampling.png?raw=true" width="400" />
</p>

## 2. Spatial blocks [![View Jupyter Notebook](https://img.shields.io/badge/view-Jupyter%20notebook-lightgrey.svg)](https://github.com/WalidGharianiEAGLE/spatial-kfold/blob/main/notebooks/spatialkfold_intro.ipynb)

```python

# 2.1 spatial resampled random blocks  

# create 10 random blocks 
ames_rnd_blocks = spatial_blocks(gdf=ames_prj, width=1500, height=1500, 
                                 method='random', nfolds=10, 
                                 random_state=135)

# resample the ames data with the prepared blocks 
ames_res_rnd_blk = gpd.overlay (ames_prj, ames_rnd_blocks)

# plot the resampled blocks
fig, ax = plt.subplots(1,2 , figsize=(10, 6)) 

# plot 1
ames_rnd_blocks.plot(column='folds',cmap=color_ramp, ax=ax[0] ,lw=0.7, legend=False)
ames_prj.plot(ax=ax[0],  markersize = 1, color = 'r')
ax[0].set_title('Random Blocks Folds')

# plot 2
ames_rnd_blocks.plot(facecolor="none",edgecolor='grey', ax=ax[1] ,lw=0.7, legend=False)
ames_res_rnd_blk.plot(column='folds', cmap=color_ramp, legend=False, ax=ax[1], markersize=3)
ax[1].set_title('Spatially Resampled\nrandom blocks')


im1 = ax[1].scatter(ames_res_rnd_blk.geometry.x , ames_res_rnd_blk.geometry.y, c=ames_res_rnd_blk['folds'], cmap=color_ramp, s=5)

axins1 = inset_axes(
    ax[1],
    width="5%",  # width: 5% of parent_bbox width
    height="50%",  # height: 50%
    loc="lower left",
    bbox_to_anchor=(1.05, 0, 1, 2),
    bbox_transform=ax[1].transAxes,
    borderpad=0
)
fig.colorbar(im1, cax=axins1,  ticks= range(1,11))

plt.show()
```

<p align="center">
  <img src="https://github.com/WalidGharianiEAGLE/spatial-kfold/blob/main/images/blocks_resampling.png?raw=true" width="700" />
</p>

## 3. Compare Random and Spatial cross validation [![View Jupyter Notebook](https://img.shields.io/badge/view-Jupyter%20notebook-lightgrey.svg)](https://github.com/WalidGharianiEAGLE/spatial-kfold/blob/main/notebooks/spatialkfold_intro.ipynb)

<p align="center">
  <img src="https://github.com/WalidGharianiEAGLE/spatial-kfold/blob/main/images/randomCV_spatialCV.png?raw=true" width="800" />
</p>

# Credits

This package was inspired by the following R packages:

* [CAST](https://github.com/HannaMeyer/CAST/)
* [spatialsample](https://github.com/tidymodels/spatialsample/) 

# Dependencies

This project relies on the following dependencies:
* [pandas](https://pandas.pydata.org)
* [numpy](https://numpy.org)
* [geopandas](https://geopandas.org)
* [shapely](https://shapely.readthedocs.io)
* [matplotlib](https://matplotlib.org)
* [scikit-learn](https://scikit-learn.org)


# Citation

If you use My Package in your research or work, please cite it using the following entries:

- MLA Style:

```
Ghariani, Walid. "spatial-kfold: A Python Package for Spatial Resampling Toward More Reliable Cross-Validation in Spatial Studies." 2023. GitHub, https://github.com/WalidGharianiEAGLE/spatial-kfold
```
- BibTex Style:

```
@Misc{spatial-kfold,
author = {Walid Ghariani},
title = {spatial-kfold: A Python Package for Spatial Resampling Toward More Reliable Cross-Validation in Spatial Studies},
howpublished = {GitHub},
year = {2023},
url = {https://github.com/WalidGharianiEAGLE/spatial-kfold}
}
```
# Resources

A list of tutorials and resources mainly in R explaining the importance of spatial resampling and spatial cross validation

*  [Hanna Meyer: "Machine-learning based modelling of spatial and spatio-temporal data"](https://www.youtube.com/watch?v=QGjdS1igq78&t=1271s)
* [Jannes Münchow: "The importance of spatial cross-validation in predictive modeling"](https://www.youtube.com/watch?v=1rSoiSb7xbw&t=649s)
* [Julia Silge: Spatial resampling for more reliable model evaluation with geographic data ](https://www.youtube.com/watch?v=wVrcw_ek3a4&t=904s)

# Bibliography

Meyer, H., Reudenbach, C., Wöllauer, S., Nauss, T. (2019): Importance of spatial predictor variable selection in machine learning applications - Moving from data reproduction to spatial prediction. Ecological Modelling. 411. https://doi.org/10.1016/j.ecolmodel.2019.108815

Schratz, Patrick, et al. "Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data." Ecological Modelling 406 (2019): 109-120. https://doi.org/10.1016/j.ecolmodel.2019.06.002

Schratz, Patrick, et al. "mlr3spatiotempcv: Spatiotemporal resampling methods for machine learning in R." arXiv preprint arXiv:2110.12674 (2021). https://arxiv.org/abs/2110.12674

Valavi, Roozbeh, et al. "blockCV: An r package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models." Biorxiv (2018): 357798. https://doi.org/10.1101/357798 

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/WalidGharianiEAGLE/spatial-kfold",
    "name": "spatial-kfold",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "cross-validation,machine-learning,GIS,spatial",
    "author": "Walid Ghariani",
    "author_email": "walid11ghariani@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/d6/77/fced7626cbf6301ccc248c495aaa6886d79eecef385088bf90f1c9312aaa/spatial-kfold-0.0.3.tar.gz",
    "platform": null,
    "description": "# spatial-kfold\n[![License: GPL-3.0](https://img.shields.io/badge/License-GPL--3.0-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)\n[![pypi](https://img.shields.io/pypi/v/spatial-kfold.svg)](https://pypi.org/project/spatial-kfold/)\n[![Downloads](https://static.pepy.tech/badge/spatial-kfold)](https://pepy.tech/project/spatial-kfold)\n\nspatial-kfold: A Python Package for Spatial Resampling Toward More Reliable Cross-Validation in Spatial Studies.\n\nspatial-kfold is a python library for performing spatial resampling to ensure more robust cross-validation in spatial studies. It offers spatial clustering and block resampling technique with user-friendly parameters to customize the resampling. It enables users to conduct a \"Leave Region Out\" cross-validation, which can be useful for evaluating the model's generalization to new locations as well as improving the reliability of [feature selection](https://doi.org/10.1016/j.ecolmodel.2019.108815) and [hyperparameter tuning](https://doi.org/10.1016/j.ecolmodel.2019.06.002) in spatial studies.\n\n\nspatial-kfold can be integrated easily with scikit-learn's [LeaveOneGroupOut](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.LeaveOneGroupOut.html) cross-validation technique. This integration enables you to further leverage the resampled spatial data for performing feature selection and hyperparameter tuning.\n\n# Main Features\n\nspatial-kfold allow conducting \"Leave Region Out\" using two spatial resampling techniques:\n\n* 1. Spatial clustering with KMeans or BisectingKMeans\n* 2. Spatial blocks\n    * Random blocks\n    * Continuous blocks \n        * tb-lr : top-bottom, left-right\n        * bt-rl : bottom-top, right-left\n\n# Installation\n\nspatial-kfold can be installed from [PyPI](https://pypi.org/project/spatial-kfold/)\n\n```\npip install spatial-kfold\n```\n\n# Example \n\n## 1. Spatial clustering with KMeans [![View Jupyter Notebook](https://img.shields.io/badge/view-Jupyter%20notebook-lightgrey.svg)](https://github.com/WalidGharianiEAGLE/spatial-kfold/blob/main/notebooks/spatialkfold_intro.ipynb)\n\n```python\nimport geopandas as gpd\nimport matplotlib.pyplot as plt\nfrom matplotlib import cm\nimport matplotlib.colors as colors\nfrom matplotlib.colors import ListedColormap, LinearSegmentedColormap\nfrom mpl_toolkits.axes_grid1.inset_locator import inset_axes\n\nfrom spatialkfold.blocks import spatial_blocks \nfrom spatialkfold.datasets import load_ames\nfrom spatialkfold.clusters import spatial_kfold_clusters \n\n# Load ames data\names = load_ames()\names_prj = ames.copy().to_crs(ames.estimate_utm_crs())\names_prj['id'] = range(len(ames_prj))\n\n# 1. Spatial cluster resampling \names_clusters = spatial_kfold_clusters (gdf=ames_prj, name='id', nfolds=10, algorithm='kmeans', random_state=569) \n\n# Get the 'tab20' colormap\ncols_tab = cm.get_cmap('tab20', 10)\n# Generate a list of colors from the colormap\ncols = [cols_tab(i) for i in range(10)]\n# create a color ramp\ncolor_ramp = ListedColormap(cols)\n\n\nfig, ax = plt.subplots(1,1 , figsize=(9, 4)) \names_clusters.plot(column='folds', ax=ax, cmap= color_ramp, markersize = 2, legend=True)\nax.set_title('Spatially Clustered Folds')\nplt.show()\n```\n\n<p align=\"center\">\n  <img src=\"https://github.com/WalidGharianiEAGLE/spatial-kfold/blob/main/images/clusters_resampling.png?raw=true\" width=\"400\" />\n</p>\n\n## 2. Spatial blocks [![View Jupyter Notebook](https://img.shields.io/badge/view-Jupyter%20notebook-lightgrey.svg)](https://github.com/WalidGharianiEAGLE/spatial-kfold/blob/main/notebooks/spatialkfold_intro.ipynb)\n\n```python\n\n# 2.1 spatial resampled random blocks  \n\n# create 10 random blocks \names_rnd_blocks = spatial_blocks(gdf=ames_prj, width=1500, height=1500, \n                                 method='random', nfolds=10, \n                                 random_state=135)\n\n# resample the ames data with the prepared blocks \names_res_rnd_blk = gpd.overlay (ames_prj, ames_rnd_blocks)\n\n# plot the resampled blocks\nfig, ax = plt.subplots(1,2 , figsize=(10, 6)) \n\n# plot 1\names_rnd_blocks.plot(column='folds',cmap=color_ramp, ax=ax[0] ,lw=0.7, legend=False)\names_prj.plot(ax=ax[0],  markersize = 1, color = 'r')\nax[0].set_title('Random Blocks Folds')\n\n# plot 2\names_rnd_blocks.plot(facecolor=\"none\",edgecolor='grey', ax=ax[1] ,lw=0.7, legend=False)\names_res_rnd_blk.plot(column='folds', cmap=color_ramp, legend=False, ax=ax[1], markersize=3)\nax[1].set_title('Spatially Resampled\\nrandom blocks')\n\n\nim1 = ax[1].scatter(ames_res_rnd_blk.geometry.x , ames_res_rnd_blk.geometry.y, c=ames_res_rnd_blk['folds'], cmap=color_ramp, s=5)\n\naxins1 = inset_axes(\n    ax[1],\n    width=\"5%\",  # width: 5% of parent_bbox width\n    height=\"50%\",  # height: 50%\n    loc=\"lower left\",\n    bbox_to_anchor=(1.05, 0, 1, 2),\n    bbox_transform=ax[1].transAxes,\n    borderpad=0\n)\nfig.colorbar(im1, cax=axins1,  ticks= range(1,11))\n\nplt.show()\n```\n\n<p align=\"center\">\n  <img src=\"https://github.com/WalidGharianiEAGLE/spatial-kfold/blob/main/images/blocks_resampling.png?raw=true\" width=\"700\" />\n</p>\n\n## 3. Compare Random and Spatial cross validation [![View Jupyter Notebook](https://img.shields.io/badge/view-Jupyter%20notebook-lightgrey.svg)](https://github.com/WalidGharianiEAGLE/spatial-kfold/blob/main/notebooks/spatialkfold_intro.ipynb)\n\n<p align=\"center\">\n  <img src=\"https://github.com/WalidGharianiEAGLE/spatial-kfold/blob/main/images/randomCV_spatialCV.png?raw=true\" width=\"800\" />\n</p>\n\n# Credits\n\nThis package was inspired by the following R packages:\n\n* [CAST](https://github.com/HannaMeyer/CAST/)\n* [spatialsample](https://github.com/tidymodels/spatialsample/) \n\n# Dependencies\n\nThis project relies on the following dependencies:\n* [pandas](https://pandas.pydata.org)\n* [numpy](https://numpy.org)\n* [geopandas](https://geopandas.org)\n* [shapely](https://shapely.readthedocs.io)\n* [matplotlib](https://matplotlib.org)\n* [scikit-learn](https://scikit-learn.org)\n\n\n# Citation\n\nIf you use My Package in your research or work, please cite it using the following entries:\n\n- MLA Style:\n\n```\nGhariani, Walid. \"spatial-kfold: A Python Package for Spatial Resampling Toward More Reliable Cross-Validation in Spatial Studies.\" 2023. GitHub, https://github.com/WalidGharianiEAGLE/spatial-kfold\n```\n- BibTex Style:\n\n```\n@Misc{spatial-kfold,\nauthor = {Walid Ghariani},\ntitle = {spatial-kfold: A Python Package for Spatial Resampling Toward More Reliable Cross-Validation in Spatial Studies},\nhowpublished = {GitHub},\nyear = {2023},\nurl = {https://github.com/WalidGharianiEAGLE/spatial-kfold}\n}\n```\n# Resources\n\nA list of tutorials and resources mainly in R explaining the importance of spatial resampling and spatial cross validation\n\n*  [Hanna Meyer: \"Machine-learning based modelling of spatial and spatio-temporal data\"](https://www.youtube.com/watch?v=QGjdS1igq78&t=1271s)\n* [Jannes M\u00fcnchow: \"The importance of spatial cross-validation in predictive modeling\"](https://www.youtube.com/watch?v=1rSoiSb7xbw&t=649s)\n* [Julia Silge: Spatial resampling for more reliable model evaluation with geographic data ](https://www.youtube.com/watch?v=wVrcw_ek3a4&t=904s)\n\n# Bibliography\n\nMeyer, H., Reudenbach, C., W\u00f6llauer, S., Nauss, T. (2019): Importance of spatial predictor variable selection in machine learning applications - Moving from data reproduction to spatial prediction. Ecological Modelling. 411. https://doi.org/10.1016/j.ecolmodel.2019.108815\n\nSchratz, Patrick, et al. \"Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data.\" Ecological Modelling 406 (2019): 109-120. https://doi.org/10.1016/j.ecolmodel.2019.06.002\n\nSchratz, Patrick, et al. \"mlr3spatiotempcv: Spatiotemporal resampling methods for machine learning in R.\" arXiv preprint arXiv:2110.12674 (2021). https://arxiv.org/abs/2110.12674\n\nValavi, Roozbeh, et al. \"blockCV: An r package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models.\" Biorxiv (2018): 357798. https://doi.org/10.1101/357798 \n",
    "bugtrack_url": null,
    "license": "GPL-3.0",
    "summary": "spatial-kfold: A Python Package for Spatial Resampling Toward More Reliable Cross-Validation in Spatial Studies.",
    "version": "0.0.3",
    "project_urls": {
        "Homepage": "https://github.com/WalidGharianiEAGLE/spatial-kfold"
    },
    "split_keywords": [
        "cross-validation",
        "machine-learning",
        "gis",
        "spatial"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9c9de7a2a6b7f9b6ebfb4879be8df2b20e135feb3c6c3a4eeffd2430364aeff7",
                "md5": "54cd45349366a500caf97f5e5ee66604",
                "sha256": "2d697fccd79612bb37547662ec34b14e0a0c348321be90432733d3823da5d41b"
            },
            "downloads": -1,
            "filename": "spatial_kfold-0.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "54cd45349366a500caf97f5e5ee66604",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 287537,
            "upload_time": "2023-10-26T09:00:45",
            "upload_time_iso_8601": "2023-10-26T09:00:45.862259Z",
            "url": "https://files.pythonhosted.org/packages/9c/9d/e7a2a6b7f9b6ebfb4879be8df2b20e135feb3c6c3a4eeffd2430364aeff7/spatial_kfold-0.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d677fced7626cbf6301ccc248c495aaa6886d79eecef385088bf90f1c9312aaa",
                "md5": "93a609540b42ea1c48a11d6fd761fdca",
                "sha256": "dbf36d34435cb37cbbd60acbfbe578eefff3ab19b3c344af78c9a59cfa50d70b"
            },
            "downloads": -1,
            "filename": "spatial-kfold-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "93a609540b42ea1c48a11d6fd761fdca",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 260467,
            "upload_time": "2023-10-26T09:00:48",
            "upload_time_iso_8601": "2023-10-26T09:00:48.437749Z",
            "url": "https://files.pythonhosted.org/packages/d6/77/fced7626cbf6301ccc248c495aaa6886d79eecef385088bf90f1c9312aaa/spatial-kfold-0.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-26 09:00:48",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "WalidGharianiEAGLE",
    "github_project": "spatial-kfold",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "spatial-kfold"
}
        
Elapsed time: 0.16200s