# GeoPre: Geospatial Data Processing Toolkit
**GeoPre** is a Python library designed to streamline common geospatial data operations, offering a unified interface for handling raster and vector datasets. It simplifies preprocessing tasks essential for GIS analysis, machine learning workflows, and remote sensing applications.
### Key Features
- **Data Scaling**:
- Normalization (Z-Score) and Min-Max scaling for raster bands.
- Prepares data for ML models while preserving geospatial metadata.
- **CRS Management**:
- Retrieve and compare Coordinate Reference Systems (CRS) across raster (Rasterio/Xarray) and vector (GeoPandas) datasets.
- Ensure consistency between datasets with automated CRS checks.
- **Reprojection**:
- Reproject vector data (GeoDataFrames) and raster data (Rasterio/Xarray) to any target CRS.
- Supports EPSG codes, WKT, and Proj4 strings.
- **No-Data Masking**:
- Handle missing values in raster datasets (NumPy/Xarray) with flexible masking.
- Integrates seamlessly with raster metadata for error-free workflows.
- **Cloud Masking**:
- Identify and mask clouds in Sentinel-2 and Landsat imagery.
- Supports multiple methods: QA bands, scene classification layers (SCL), probability bands, and OmniCloudMask AI-based detection.
- Optionally mask cloud shadows for improved accuracy.
- **Band Stacking**:
- Stack multiple raster bands from a folder into a single multi-band raster for analysis.
- Supports automatic band detection and resampling for different resolutions.
### Supported Data Types
- **Raster**: NumPy arrays, Rasterio `DatasetReader`, Xarray `DataArray` (via rioxarray).
- **Vector**: GeoPandas `GeoDataFrame`.
### Benefits of GeoPre
- **Unified Workflow**: Eliminates boilerplate code by providing consistent functions for raster and vector data.
- **Interoperability**: Bridges gaps between GeoPandas, Rasterio, and Xarray, ensuring smooth data transitions.
- **Robust Error Handling**: Automatically detects CRS mismatches and missing metadata to prevent silent failures.
- **Efficiency**: Optimized reprojection and masking operations reduce preprocessing time for large datasets.
- **ML-Ready Outputs**: Scaling functions preserve data structure, making outputs directly usable in machine learning pipelines.
Ideal for researchers and developers working with geospatial data, **GeoPre** enhances productivity by standardizing preprocessing steps and ensuring compatibility across diverse geospatial tools.
## Installation
Ensure you have the required dependencies installed before using this library:
```bash
pip install numpy geopandas rasterio rioxarray xarray pyproj
```
## Usage
### 1. Data Scaling
```python
import numpy as np
from scaling_and_reproject import Z_score_scaling, Min_Max_Scaling
data = np.array([[10, 20, 30], [40, 50, 60]])
z_scaled = Z_score_scaling(data)
minmax_scaled = Min_Max_Scaling(data)
```
### 2. CRS Management
```python
import geopandas as gpd
import rasterio
from scaling_and_reproject import get_crs, compare_crs
vector = gpd.read_file("data.shp")
raster = rasterio.open("image.tif")
print(get_crs(vector)) # EPSG:4326
print(compare_crs(raster, vector)) # CRS comparison results
```
### 3. Reprojection
```python
import rasterio
import xarray as xr
from scaling_and_reproject import reproject_data
# Vector reprojection
reprojected_vector = reproject_data(vector, "EPSG:3857")
# Raster reprojection (Rasterio)
with rasterio.open("input.tif") as src:
array, metadata = reproject_data(src, "EPSG:32633")
# Xarray reprojection
da = xr.open_rasterio("image.tif")
reprojected_da = reproject_data(da, "EPSG:4326")
```
### 4. Data Masking
```python
import xarray as xr
import rasterio
from scaling_and_reproject import mask_raster_data
# Rasterio workflow
with rasterio.open("data.tif") as src:
data = src.read(1)
masked, profile = mask_raster_data(data, src.profile)
# rioxarray workflow
da = xr.open_rasterio("data.tif")
masked_da = mask_raster_data(da)
```
### 5. Cloud Masking
#### `mask_clouds_S2`
**Description**: Masks clouds and optionally shadows in a Sentinel-2 raster image using various methods.
**Parameters**:
- **image_path** *(str)*: Path to the input raster image.
- **output_path** *(str, optional)*: Path to save the masked output raster. Defaults to the same directory as the input with '_masked' appended to the filename.
- **method** *(str, optional)*: The method for masking ('auto', 'qa', 'probability', 'omnicloudmask', 'scl', 'standard'). Defaults to 'auto'.
- **mask_shadows** *(bool)*: Whether to mask cloud shadows. Defaults to False.
- **threshold** *(int)*: Cloud probability threshold (if using a cloud probability band), from 0 to 100. Defaults to 20.
- **nodata_value** *(int)*: Value for no-data regions. Defaults to `np.nan`.
**Returns**:
- *(str)*: The path to the saved masked output raster.
#### Example:
```python
from cloud_masking import mask_clouds_S2
output_s2 = mask_clouds_S2("sentinel2_image.tif", method='auto', mask_shadows=True)
```
#### `mask_clouds_landsat`
**Description**: Masks clouds and optionally shadows in a Landsat raster image using various methods.
**Parameters**:
- **image_path** *(str)*: Path to the input multi-band raster image.
- **output_path** *(str, optional)*: Path to save the masked output raster. Defaults to the same directory as the input with '_masked' suffix.
- **method** *(str)*: The method for masking ('auto', 'qa', 'omnicloudmask'). Defaults to 'auto'.
- **mask_shadows** *(bool)*: Whether to mask cloud shadows. Defaults to False.
- **nodata_value** *(int)*: Value for no-data regions. Defaults to `np.nan`.
**Returns**:
- *(str)*: The path to the saved masked output raster.
#### Example:
```python
from cloud_masking import mask_clouds_landsat
output_landsat = mask_clouds_landsat("landsat_image.tif", method='auto', mask_shadows=True)
```
### 6. Band Stacking
#### `stack_bands`
**Description**: Stacks multiple raster bands from a folder into a single multi-band raster.
**Parameters**:
- **input_path** *(str or Path)*: Path to the folder containing band files.
- **required_bands** *(list of str)*: List of band name identifiers (e.g., ["B4", "B3", "B2"]).
- **output_path** *(str or Path, optional)*: Path to save the stacked raster. Defaults to "stacked.tif" in the input folder.
- **resolution** *(float, optional)*: Target resolution for resampling. Defaults to the highest available resolution.
**Returns**:
- *(str)*: The path to the saved stacked output raster.
#### Example:
```python
from stacking import stack_bands
stacked_image = stack_bands("/path/to/folder/containing/bands", ["B4", "B3", "B2"])
```
## Contributing
1. **Fork the repository**
Click the "Fork" button at the top-right of this repository to create your copy.
2. **Create your feature branch**
```bash
git checkout -b feature/your-feature
3. **Commit changes**
```bash
git commit -am 'Add some feature'
4. **Push to branch**
```bash
git push origin feature/your-feature
5. **Open a Pull Request**
Navigate to the Pull Requests tab in the original repository and click "New Pull Request" to submit your changes.
## License
This project is licensed under the MIT License. See LICENSE for more information.
## Author
[Your Name] – [Your Email or GitHub Profile]
Raw data
{
"_id": null,
"home_page": "https://github.com/MatteoGF",
"name": "geopreprova",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "sentinel-1 glacier velocity offset tracking remote sensing",
"author": "Matteo Gobbi Frattini, Liang Zhongyou",
"author_email": "matteo.gf@live.it",
"download_url": "https://files.pythonhosted.org/packages/78/4f/bb44818cf4a432ebb0ec89f21e8b7fcaca736fc30133d663789af5e3286a/geopreprova-0.1.2.tar.gz",
"platform": null,
"description": "# GeoPre: Geospatial Data Processing Toolkit \r\n**GeoPre** is a Python library designed to streamline common geospatial data operations, offering a unified interface for handling raster and vector datasets. It simplifies preprocessing tasks essential for GIS analysis, machine learning workflows, and remote sensing applications.\r\n\r\n\r\n### Key Features \r\n- **Data Scaling**: \r\n - Normalization (Z-Score) and Min-Max scaling for raster bands. \r\n - Prepares data for ML models while preserving geospatial metadata. \r\n\r\n- **CRS Management**: \r\n - Retrieve and compare Coordinate Reference Systems (CRS) across raster (Rasterio/Xarray) and vector (GeoPandas) datasets. \r\n - Ensure consistency between datasets with automated CRS checks. \r\n\r\n- **Reprojection**: \r\n - Reproject vector data (GeoDataFrames) and raster data (Rasterio/Xarray) to any target CRS. \r\n - Supports EPSG codes, WKT, and Proj4 strings. \r\n\r\n- **No-Data Masking**: \r\n - Handle missing values in raster datasets (NumPy/Xarray) with flexible masking. \r\n - Integrates seamlessly with raster metadata for error-free workflows. \r\n\r\n- **Cloud Masking**: \r\n - Identify and mask clouds in Sentinel-2 and Landsat imagery. \r\n - Supports multiple methods: QA bands, scene classification layers (SCL), probability bands, and OmniCloudMask AI-based detection. \r\n - Optionally mask cloud shadows for improved accuracy. \r\n\r\n- **Band Stacking**: \r\n - Stack multiple raster bands from a folder into a single multi-band raster for analysis. \r\n - Supports automatic band detection and resampling for different resolutions. \r\n\r\n\r\n### Supported Data Types \r\n- **Raster**: NumPy arrays, Rasterio `DatasetReader`, Xarray `DataArray` (via rioxarray). \r\n- **Vector**: GeoPandas `GeoDataFrame`. \r\n\r\n\r\n### Benefits of GeoPre \r\n- **Unified Workflow**: Eliminates boilerplate code by providing consistent functions for raster and vector data. \r\n- **Interoperability**: Bridges gaps between GeoPandas, Rasterio, and Xarray, ensuring smooth data transitions. \r\n- **Robust Error Handling**: Automatically detects CRS mismatches and missing metadata to prevent silent failures. \r\n- **Efficiency**: Optimized reprojection and masking operations reduce preprocessing time for large datasets. \r\n- **ML-Ready Outputs**: Scaling functions preserve data structure, making outputs directly usable in machine learning pipelines. \r\n\r\n\r\nIdeal for researchers and developers working with geospatial data, **GeoPre** enhances productivity by standardizing preprocessing steps and ensuring compatibility across diverse geospatial tools.\r\n\r\n\r\n## Installation\r\nEnsure you have the required dependencies installed before using this library:\r\n```bash\r\npip install numpy geopandas rasterio rioxarray xarray pyproj\r\n```\r\n\r\n## Usage\r\n### 1. Data Scaling\r\n```python\r\nimport numpy as np\r\nfrom scaling_and_reproject import Z_score_scaling, Min_Max_Scaling\r\n\r\ndata = np.array([[10, 20, 30], [40, 50, 60]])\r\nz_scaled = Z_score_scaling(data)\r\nminmax_scaled = Min_Max_Scaling(data)\r\n```\r\n\r\n### 2. CRS Management\r\n```python\r\nimport geopandas as gpd\r\nimport rasterio\r\nfrom scaling_and_reproject import get_crs, compare_crs\r\n\r\nvector = gpd.read_file(\"data.shp\")\r\nraster = rasterio.open(\"image.tif\")\r\n\r\nprint(get_crs(vector)) # EPSG:4326\r\nprint(compare_crs(raster, vector)) # CRS comparison results\r\n```\r\n\r\n### 3. Reprojection\r\n```python\r\nimport rasterio\r\nimport xarray as xr\r\nfrom scaling_and_reproject import reproject_data\r\n\r\n# Vector reprojection\r\nreprojected_vector = reproject_data(vector, \"EPSG:3857\")\r\n\r\n# Raster reprojection (Rasterio)\r\nwith rasterio.open(\"input.tif\") as src:\r\n array, metadata = reproject_data(src, \"EPSG:32633\")\r\n\r\n# Xarray reprojection\r\nda = xr.open_rasterio(\"image.tif\")\r\nreprojected_da = reproject_data(da, \"EPSG:4326\")\r\n```\r\n\r\n### 4. Data Masking\r\n```python\r\nimport xarray as xr\r\nimport rasterio\r\nfrom scaling_and_reproject import mask_raster_data\r\n\r\n# Rasterio workflow\r\nwith rasterio.open(\"data.tif\") as src:\r\n data = src.read(1)\r\n masked, profile = mask_raster_data(data, src.profile)\r\n\r\n# rioxarray workflow\r\nda = xr.open_rasterio(\"data.tif\")\r\nmasked_da = mask_raster_data(da)\r\n```\r\n\r\n### 5. Cloud Masking\r\n#### `mask_clouds_S2`\r\n**Description**: Masks clouds and optionally shadows in a Sentinel-2 raster image using various methods.\r\n\r\n**Parameters**:\r\n- **image_path** *(str)*: Path to the input raster image.\r\n- **output_path** *(str, optional)*: Path to save the masked output raster. Defaults to the same directory as the input with '_masked' appended to the filename.\r\n- **method** *(str, optional)*: The method for masking ('auto', 'qa', 'probability', 'omnicloudmask', 'scl', 'standard'). Defaults to 'auto'.\r\n- **mask_shadows** *(bool)*: Whether to mask cloud shadows. Defaults to False.\r\n- **threshold** *(int)*: Cloud probability threshold (if using a cloud probability band), from 0 to 100. Defaults to 20.\r\n- **nodata_value** *(int)*: Value for no-data regions. Defaults to `np.nan`.\r\n\r\n**Returns**:\r\n- *(str)*: The path to the saved masked output raster.\r\n\r\n#### Example:\r\n```python\r\nfrom cloud_masking import mask_clouds_S2\r\n\r\noutput_s2 = mask_clouds_S2(\"sentinel2_image.tif\", method='auto', mask_shadows=True)\r\n```\r\n\r\n#### `mask_clouds_landsat`\r\n**Description**: Masks clouds and optionally shadows in a Landsat raster image using various methods.\r\n\r\n**Parameters**:\r\n- **image_path** *(str)*: Path to the input multi-band raster image.\r\n- **output_path** *(str, optional)*: Path to save the masked output raster. Defaults to the same directory as the input with '_masked' suffix.\r\n- **method** *(str)*: The method for masking ('auto', 'qa', 'omnicloudmask'). Defaults to 'auto'.\r\n- **mask_shadows** *(bool)*: Whether to mask cloud shadows. Defaults to False.\r\n- **nodata_value** *(int)*: Value for no-data regions. Defaults to `np.nan`.\r\n\r\n**Returns**:\r\n- *(str)*: The path to the saved masked output raster.\r\n\r\n#### Example:\r\n```python\r\nfrom cloud_masking import mask_clouds_landsat\r\n\r\noutput_landsat = mask_clouds_landsat(\"landsat_image.tif\", method='auto', mask_shadows=True)\r\n```\r\n\r\n### 6. Band Stacking\r\n#### `stack_bands`\r\n**Description**: Stacks multiple raster bands from a folder into a single multi-band raster.\r\n\r\n**Parameters**:\r\n- **input_path** *(str or Path)*: Path to the folder containing band files.\r\n- **required_bands** *(list of str)*: List of band name identifiers (e.g., [\"B4\", \"B3\", \"B2\"]).\r\n- **output_path** *(str or Path, optional)*: Path to save the stacked raster. Defaults to \"stacked.tif\" in the input folder.\r\n- **resolution** *(float, optional)*: Target resolution for resampling. Defaults to the highest available resolution.\r\n\r\n**Returns**:\r\n- *(str)*: The path to the saved stacked output raster.\r\n\r\n#### Example:\r\n```python\r\nfrom stacking import stack_bands\r\n\r\nstacked_image = stack_bands(\"/path/to/folder/containing/bands\", [\"B4\", \"B3\", \"B2\"])\r\n```\r\n\r\n## Contributing\r\n\r\n1. **Fork the repository** \r\n \r\n Click the \"Fork\" button at the top-right of this repository to create your copy.\r\n \r\n2. **Create your feature branch** \r\n ```bash\r\n git checkout -b feature/your-feature\r\n \r\n3. **Commit changes** \r\n ```bash\r\n git commit -am 'Add some feature'\r\n \r\n4. **Push to branch** \r\n ```bash\r\n git push origin feature/your-feature\r\n\r\n5. **Open a Pull Request**\r\n \r\n Navigate to the Pull Requests tab in the original repository and click \"New Pull Request\" to submit your changes.\r\n\r\n \r\n## License\r\nThis project is licensed under the MIT License. See LICENSE for more information.\r\n\r\n\r\n## Author\r\n[Your Name] \u00e2\u20ac\u201c [Your Email or GitHub Profile]\r\n\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": null,
"version": "0.1.2",
"project_urls": {
"Homepage": "https://github.com/MatteoGF"
},
"split_keywords": [
"sentinel-1",
"glacier",
"velocity",
"offset",
"tracking",
"remote",
"sensing"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "9e10c251c26f5b6eb76247511f81ed0dc2c79ac6c17735721be7af4f69bcc3e4",
"md5": "14ec2651021f0168da85489e227395d1",
"sha256": "5b19df2749aea1295b5851abf4f237f93b6e59aa53f091650352d2c5efbb8096"
},
"downloads": -1,
"filename": "geopreprova-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "14ec2651021f0168da85489e227395d1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 15137,
"upload_time": "2025-02-02T20:02:21",
"upload_time_iso_8601": "2025-02-02T20:02:21.583178Z",
"url": "https://files.pythonhosted.org/packages/9e/10/c251c26f5b6eb76247511f81ed0dc2c79ac6c17735721be7af4f69bcc3e4/geopreprova-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "784fbb44818cf4a432ebb0ec89f21e8b7fcaca736fc30133d663789af5e3286a",
"md5": "c117d794e8582a388334d4598db21717",
"sha256": "a78c5e00ba1bf11562d3de320079f65b06119155e1b524be5386bed5de6dc311"
},
"downloads": -1,
"filename": "geopreprova-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "c117d794e8582a388334d4598db21717",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 16490,
"upload_time": "2025-02-02T20:02:23",
"upload_time_iso_8601": "2025-02-02T20:02:23.971360Z",
"url": "https://files.pythonhosted.org/packages/78/4f/bb44818cf4a432ebb0ec89f21e8b7fcaca736fc30133d663789af5e3286a/geopreprova-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-02 20:02:23",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "geopreprova"
}