earthstat


Nameearthstat JSON
Version 0.8.2 PyPI version JSON
download
home_pageNone
SummaryEarthStat Library
upload_time2024-04-02 18:24:25
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseMIT License
keywords earthstat
VCS
bugtrack_url
requirements rasterio shapely numpy pandas tqdm geopandas pyproj cdsapi xarray rioxarray
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # EarthStat

[![image](https://img.shields.io/pypi/v/earthstat.svg)](https://pypi.python.org/pypi/earthstat)
[![Downloads](https://static.pepy.tech/badge/earthstat)](https://pepy.tech/project/earthstat)
[![image](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![image](https://img.shields.io/conda/vn/conda-forge/earthstat.svg)](https://anaconda.org/conda-forge/earthstat)
<a href="https://www.buymeacoffee.com/abdelrahmansaleh"><img src="https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png" height="20px"></a>

**A Python package for efficiently generating statistical datasets from raster data for spatial units.**

* GitHub repo: [https://github.com/AbdelrahmanAmr3/earthstat](https://github.com/AbdelrahmanAmr3/earthstat)
* Documentation: [https://abdelrahmanamr3.github.io/earthstat](https://abdelrahmanamr3.github.io/earthstat)
* PyPI: [https://pypi.org/project/earthstat](https://pypi.org/project/earthstat/)
* Free software: MIT license

EarthStat's Workflows Notebooks:

* Main Workflow: [Google Colab](https://colab.research.google.com/github/AbdelrahmanAmr3/earthstat/blob/master/docs/examples/intro.ipynb),
[Binder](https://colab.research.google.com/github/AbdelrahmanAmr3/earthstat/blob/master/docs/examples/intro.ipynb)

* xEearthStat Workflow: [Google Colab](https://colab.research.google.com/github/AbdelrahmanAmr3/earthstat/blob/master/docs/examples/xES.ipynb),
[Binder]()


## Introduction

Inspired by my engagement with the AgML community's "Regional Crop Yield Forecasting" challenge, I created a Python library designed to set benchmarks for Machine Learning (ML) models. The library presents an efficient workflow for extracting statistical information from big remote sensing and climate datasets. Currently, The library presents two workflows. First, for dealing with GeoTIFF files, as the main workflow of EarthStat. In addition, it presents a unique workflow for AgERA5 datasets, which gives the user the power to download a huge amount of different variables using CDS API, extended to extract and aggregate all downloaded data. EarthStat's workflows provide multiprocessing and GPU for parallel computation as an option. This library is particularly suited for creating statistical information datasets for ML models or for environmental analyses and monitoring.

## EarthStat Main Workflow
This diagram illustrates the workflow of the geospatial data processing implemented in EarthStat from the initialized dataset to the created CSV file.

![Geospatial Data Processing Workflow](docs/assests/workflow.png)
## EarthStat Main Workflow Features

EarthStat revolutionizes the extraction of statistical information from geographic data, offering a seamless workflow for effective data management:

- **Data Initialization & Geo-metadata Readability:** Streamlines the incorporation of datasets into EarthStat workflow, and getting insights of vital geo-metadata for data (Raster, Mask, Shapefile).

- **netCDF Conversion:** Seamlessly integrates netCDF files into the workflow, converting them effortlessly into GeoTIFF format.

- **Data Compatibility Assurance:** Simplifies ensuring data compatibility, swiftly identifying and addressing geo data discrepancies among initialized data (Raster, Mask, Shapefile).

- **Automated Resolution of Compatibility Issues:** EarthStat resolves compatibility concerns, employing automatic resampling or reprojecting techniques for masks, and appropriate projection adjustments for shapefiles.

- **Targeted Region Selection:** Easily filter the shapefile to the targeted region.

- **Data Clipping:** Allows for clip raster data to specific shapefile boundaries.

- **Advanced Statistical Data Extraction:** Offers a variety of statistical aggregation methods.

- **Efficient Parallel Processing:** Leverages the power of multiprocessing, significantly accelerating data processing across extensive datasets for quicker, more efficient computation.

## xEarthStat Workflow For AgERA5
This diagram illustrates the workflow of xEearthStat for AgERA5 data processing.

![xEarthStat Workflow](docs/assests/xES_workflow.png)


## EarthStat Main Workflow Features
- **Unlimited AgERA5 Data Downloads**: The EarthStat workflow enables users to bypass the limitations of the CDS server, allowing for the download of any quantity of data for the required variables.
- **Fully Automated**: This library is entirely automated and does not require any prior Python knowledge. Users simply need to select the variables for download and aggregation, specify the start and end years to determine the data volume, and define the shapefile containing the geometry objects.
- **Parallel Computation**: EarthStat workflow intelligently detects GPU availability to shift aggregation processes for parallel computation on the GPU. It also offers users the option to leverage available CPU cores for multiprocessing (Parallel Execution), enhancing I/O-bound tasks.
- **Aggregated Data as CSV**: Ultimately, the workflow provides users with a neatly organized CSV file, compiling all downloaded and aggregated variables.

### xEarthStat Workflow Performance on Google Colab
This table demonstrates the workflow's performance across various configurations, ranging from multiprocessing to GPU usage for parallel computation by using Google Colab.

| Data      | Variables | Number of Geo-Objects | Dataset | Processing Unit            | Time (Run: One Time) min |
|-----------|-----------|-----------------------|---------|----------------------------|--------------------------|
| Two year  | 7         | EU (478)              | Dekadal | CPU – Single Processing    | 13:56                    |
| -         | -         | -                     | Dekadal | CPU – Multiprocessing     | 13:48                    |
| -         | -         | -                     | Daily   | CPU – Single Processing    | 1:20:43                  |
| -         | -         | -                     | Daily   | CPU – Multiprocessing     | 1:18:32                  |
| -         | -         | -                     | Dekadal | T4 GPU – Single Processing | 04:32                    |
| -         | -         | -                     | Dekadal | T4 GPU – Multiprocessing  | 04:12                    |
| -         | -         | -                     | Daily   | T4 GPU – Single Processing | 06:35                    |
| -         | -         | -                     | Daily   | T4 GPU – Multiprocessing  | 06:14                    |

## EarthStat Python Library - Improvements Roadmap
### EarthStat Main Workflow
#### Data Processing and Scenario Management Enhancements 
- [x] offering more statistical options for aggregation.
- [ ] Introduce thresholding option for masks to refine data selection.
- [ ] Refactor Dataloader and Data Compatibility for no mask scenario.

#### Automation for User Convenience
- [ ] Implement automatic detection of the lag between date ranges of predictor data.
- [ ] Automatically identify the column names for countries in the dataset.
- [ ] Enable users to specify date ranges for predictor data, improving data filtering capabilities.

### xEarthStat Workflow for AgERA5
- [ ] Option to mask the AgERA5's data with mask



## Installation
To install EarthStat, ensure you have Python 3.9 or later installed. 

Install with pip:
```
pip install earthstat
```
Install with Conda:
```
conda install conda-forge::earthstat
```
## EarthStat Usage
* [EarthStat Main Workflow](https://abdelrahmanamr3.github.io/earthstat/usage/main_usage)


* [xEearthStat for AgERA5 Workflow](https://abdelrahmanamr3.github.io/earthstat/usage/xES_usage)


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "earthstat",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "earthstat",
    "author": null,
    "author_email": "Abdelrahman Saleh <abdulrahman.amr.ali@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/b8/4b/43bb07e98ccc47e709ca03abea8db7a8b8892741782c8ab9de6ac52936bb/earthstat-0.8.2.tar.gz",
    "platform": null,
    "description": "# EarthStat\n\n[![image](https://img.shields.io/pypi/v/earthstat.svg)](https://pypi.python.org/pypi/earthstat)\n[![Downloads](https://static.pepy.tech/badge/earthstat)](https://pepy.tech/project/earthstat)\n[![image](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![image](https://img.shields.io/conda/vn/conda-forge/earthstat.svg)](https://anaconda.org/conda-forge/earthstat)\n<a href=\"https://www.buymeacoffee.com/abdelrahmansaleh\"><img src=\"https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png\" height=\"20px\"></a>\n\n**A Python package for efficiently generating statistical datasets from raster data for spatial units.**\n\n* GitHub repo: [https://github.com/AbdelrahmanAmr3/earthstat](https://github.com/AbdelrahmanAmr3/earthstat)\n* Documentation: [https://abdelrahmanamr3.github.io/earthstat](https://abdelrahmanamr3.github.io/earthstat)\n* PyPI: [https://pypi.org/project/earthstat](https://pypi.org/project/earthstat/)\n* Free software: MIT license\n\nEarthStat's Workflows Notebooks:\n\n* Main Workflow: [Google Colab](https://colab.research.google.com/github/AbdelrahmanAmr3/earthstat/blob/master/docs/examples/intro.ipynb),\n[Binder](https://colab.research.google.com/github/AbdelrahmanAmr3/earthstat/blob/master/docs/examples/intro.ipynb)\n\n* xEearthStat Workflow: [Google Colab](https://colab.research.google.com/github/AbdelrahmanAmr3/earthstat/blob/master/docs/examples/xES.ipynb),\n[Binder]()\n\n\n## Introduction\n\nInspired by my engagement with the AgML community's \"Regional Crop Yield Forecasting\" challenge, I created a Python library designed to set benchmarks for Machine Learning (ML) models. The library presents an efficient workflow for extracting statistical information from big remote sensing and climate datasets. Currently, The library presents two workflows. First, for dealing with GeoTIFF files, as the main workflow of EarthStat. In addition, it presents a unique workflow for AgERA5 datasets, which gives the user the power to download a huge amount of different variables using CDS API, extended to extract and aggregate all downloaded data. EarthStat's workflows provide multiprocessing and GPU for parallel computation as an option. This library is particularly suited for creating statistical information datasets for ML models or for environmental analyses and monitoring.\n\n## EarthStat Main Workflow\nThis diagram illustrates the workflow of the geospatial data processing implemented in EarthStat from the initialized dataset to the created CSV file.\n\n![Geospatial Data Processing Workflow](docs/assests/workflow.png)\n## EarthStat Main Workflow Features\n\nEarthStat revolutionizes the extraction of statistical information from geographic data, offering a seamless workflow for effective data management:\n\n- **Data Initialization & Geo-metadata Readability:** Streamlines the incorporation of datasets into EarthStat workflow, and getting insights of vital geo-metadata for data (Raster, Mask, Shapefile).\n\n- **netCDF Conversion:** Seamlessly integrates netCDF files into the workflow, converting them effortlessly into GeoTIFF format.\n\n- **Data Compatibility Assurance:** Simplifies ensuring data compatibility, swiftly identifying and addressing geo data discrepancies among initialized data (Raster, Mask, Shapefile).\n\n- **Automated Resolution of Compatibility Issues:** EarthStat resolves compatibility concerns, employing automatic resampling or reprojecting techniques for masks, and appropriate projection adjustments for shapefiles.\n\n- **Targeted Region Selection:** Easily filter the shapefile to the targeted region.\n\n- **Data Clipping:** Allows for clip raster data to specific shapefile boundaries.\n\n- **Advanced Statistical Data Extraction:** Offers a variety of statistical aggregation methods.\n\n- **Efficient Parallel Processing:** Leverages the power of multiprocessing, significantly accelerating data processing across extensive datasets for quicker, more efficient computation.\n\n## xEarthStat Workflow For AgERA5\nThis diagram illustrates the workflow of xEearthStat for AgERA5 data processing.\n\n![xEarthStat Workflow](docs/assests/xES_workflow.png)\n\n\n## EarthStat Main Workflow Features\n- **Unlimited AgERA5 Data Downloads**: The EarthStat workflow enables users to bypass the limitations of the CDS server, allowing for the download of any quantity of data for the required variables.\n- **Fully Automated**: This library is entirely automated and does not require any prior Python knowledge. Users simply need to select the variables for download and aggregation, specify the start and end years to determine the data volume, and define the shapefile containing the geometry objects.\n- **Parallel Computation**: EarthStat workflow intelligently detects GPU availability to shift aggregation processes for parallel computation on the GPU. It also offers users the option to leverage available CPU cores for multiprocessing (Parallel Execution), enhancing I/O-bound tasks.\n- **Aggregated Data as CSV**: Ultimately, the workflow provides users with a neatly organized CSV file, compiling all downloaded and aggregated variables.\n\n### xEarthStat Workflow Performance on Google Colab\nThis table demonstrates the workflow's performance across various configurations, ranging from multiprocessing to GPU usage for parallel computation by using Google Colab.\n\n| Data      | Variables | Number of Geo-Objects | Dataset | Processing Unit            | Time (Run: One Time) min |\n|-----------|-----------|-----------------------|---------|----------------------------|--------------------------|\n| Two year  | 7         | EU (478)              | Dekadal | CPU \u2013 Single Processing    | 13:56                    |\n| -         | -         | -                     | Dekadal | CPU \u2013 Multiprocessing     | 13:48                    |\n| -         | -         | -                     | Daily   | CPU \u2013 Single Processing    | 1:20:43                  |\n| -         | -         | -                     | Daily   | CPU \u2013 Multiprocessing     | 1:18:32                  |\n| -         | -         | -                     | Dekadal | T4 GPU \u2013 Single Processing | 04:32                    |\n| -         | -         | -                     | Dekadal | T4 GPU \u2013 Multiprocessing  | 04:12                    |\n| -         | -         | -                     | Daily   | T4 GPU \u2013 Single Processing | 06:35                    |\n| -         | -         | -                     | Daily   | T4 GPU \u2013 Multiprocessing  | 06:14                    |\n\n## EarthStat Python Library - Improvements Roadmap\n### EarthStat Main Workflow\n#### Data Processing and Scenario Management Enhancements \n- [x] offering more statistical options for aggregation.\n- [ ] Introduce thresholding option for masks to refine data selection.\n- [ ] Refactor Dataloader and Data Compatibility for no mask scenario.\n\n#### Automation for User Convenience\n- [ ] Implement automatic detection of the lag between date ranges of predictor data.\n- [ ] Automatically identify the column names for countries in the dataset.\n- [ ] Enable users to specify date ranges for predictor data, improving data filtering capabilities.\n\n### xEarthStat Workflow for AgERA5\n- [ ] Option to mask the AgERA5's data with mask\n\n\n\n## Installation\nTo install EarthStat, ensure you have Python 3.9 or later installed. \n\nInstall with pip:\n```\npip install earthstat\n```\nInstall with Conda:\n```\nconda install conda-forge::earthstat\n```\n## EarthStat Usage\n* [EarthStat Main Workflow](https://abdelrahmanamr3.github.io/earthstat/usage/main_usage)\n\n\n* [xEearthStat for AgERA5 Workflow](https://abdelrahmanamr3.github.io/earthstat/usage/xES_usage)\n\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "EarthStat Library",
    "version": "0.8.2",
    "project_urls": {
        "Homepage": "https://github.com/AbdelrahmanAmr3/earthstat"
    },
    "split_keywords": [
        "earthstat"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3d24b06e7cee47913767c077e9c70cd3f9f995dfb1415693df46afd3587f7ce9",
                "md5": "6d516be41ad6010a6e1c36a69424e8ab",
                "sha256": "897d139dc340b6ea3ca39a1253561478ef140b28a898faee5edc97bbc3eb1281"
            },
            "downloads": -1,
            "filename": "earthstat-0.8.2-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6d516be41ad6010a6e1c36a69424e8ab",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.9",
            "size": 34505,
            "upload_time": "2024-04-02T18:24:23",
            "upload_time_iso_8601": "2024-04-02T18:24:23.512119Z",
            "url": "https://files.pythonhosted.org/packages/3d/24/b06e7cee47913767c077e9c70cd3f9f995dfb1415693df46afd3587f7ce9/earthstat-0.8.2-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b84b43bb07e98ccc47e709ca03abea8db7a8b8892741782c8ab9de6ac52936bb",
                "md5": "3a80ad833d18c0b28d9369a74e75364a",
                "sha256": "5bb37890269f6cbd5e1338597851ee74a30bf32c8497c99bd789db230a9df576"
            },
            "downloads": -1,
            "filename": "earthstat-0.8.2.tar.gz",
            "has_sig": false,
            "md5_digest": "3a80ad833d18c0b28d9369a74e75364a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 1353831,
            "upload_time": "2024-04-02T18:24:25",
            "upload_time_iso_8601": "2024-04-02T18:24:25.595737Z",
            "url": "https://files.pythonhosted.org/packages/b8/4b/43bb07e98ccc47e709ca03abea8db7a8b8892741782c8ab9de6ac52936bb/earthstat-0.8.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-02 18:24:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AbdelrahmanAmr3",
    "github_project": "earthstat",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "rasterio",
            "specs": []
        },
        {
            "name": "shapely",
            "specs": []
        },
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "tqdm",
            "specs": []
        },
        {
            "name": "geopandas",
            "specs": []
        },
        {
            "name": "pyproj",
            "specs": []
        },
        {
            "name": "cdsapi",
            "specs": []
        },
        {
            "name": "xarray",
            "specs": [
                [
                    "==",
                    "2023.11.0"
                ]
            ]
        },
        {
            "name": "rioxarray",
            "specs": []
        }
    ],
    "lcname": "earthstat"
}
        
Elapsed time: 0.22738s