mosaiks


Namemosaiks JSON
Version 1.0.1 PyPI version JSON
download
home_page
SummaryUse the MOSAIKS algorithm to create features from satellite imagery
upload_time2023-12-13 12:04:04
maintainer
docs_urlNone
authorIDinsight
requires_python>=3.9
license
keywords satellite mosaiks geospatial
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # MOSAIKS Satellite Imagery Featurization

MOSAIKS is a Python package that performs parallelized encoding of satellite imagery into easy-to-use features using the MOSAIKS algorithm and Dask for parallel processing. This package enables users to generate feature vectors based on satellite images by providing a list of latitudes and longitudes and Microsoft's Planetary Computer API key. It supports various satellites, image sizes, time periods, and parallelization options.

We implement the MOSAIKS algorithm based on work by [Rolf et al., 2021](https://www.nature.com/articles/s41467-021-24638-z) based on random convolutional features. The authors of this paper make a global cross-section of pre-computed features using Planet imagery from 2019 available at [mosaiks.org](https://www.mosaiks.org/), along with tutorials and related research.

This package extends the functionality of the original MOSAIKS implementation in the following ways:

- Flexibility in choice of satellite (tested for Landsat-8 and Sentinel-2)
- Ability to select the timeframe from which to pull imagery
- Flexibility in choosing the size of the images centred on points of interest, image bands used, etc.
- Flexible upload and download of data
- Parallel processing with Dask to speed up fetching images and creating features.
- Once installed, the package can be run on any machine (with the API key).

The package has been tested via [Microsoft Planetary Computer](https://planetarycomputer.microsoft.com/) on Landsat-8 or Sentinel-2 imagery. **Please note this package has only been tested on Python 3.10 and 3.11**. Using other versions of Python are expected to raise errors due to dependency conflicts.

For more detailed information on this package and how to use it, please see [this blog post](https://idinsight.github.io/tech-blog/blog/mosaiks_part_1/). For information on preview and potential use cases for this package, please see [this blog post](https://www.idinsight.org/?post_type=article&p=20518&preview=true).  For more information on MOSAIKS and previous use cases, see the MOSAIKS website [here](https://www.mosaiks.org/).

Users of this package should acknowledge *IDinsight* and reference the MOSAIKS RCF algorithm as Rolf, Esther, et al. "A generalizable and accessible approach to machine learning with global satellite imagery." *Nature communications* 12.1 (2021): 4392.

## Quick Start

This section highlights a demo to help you get features ASAP.

### Step 1: Set-up

Ensure you have all requirements set up:

1. Install Python 3.10 or 3.11.
2. Install the MOSAIKS package -

    ```sh
    pip install git+https://github.com/IDinsight/mosaiks
    ```

    or

    ```sh
    pip install mosaiks
    ```

4. Acquire the Planetary Computer API key from [Microsoft Planetary Computer (MPC)](https://planetarycomputer.microsoft.com/). We provide detailed instructions for getting an API key in the FAQs section of this README.

    In your terminal, run the following and fill in the API key prompt -

    ```sh
    planetarycomputer configure
    ```

### Step 2: Test run in a Notebook

The quickest way to test the package is to run it in a notebook. Open up a notebook in the relevant environment (where Step 1 was executed) and run the following (the code is present in the "README_DEMO.ipynb" notebook):

1. **Import dependencies**

    ```python
    import os

    # Resolves a conflict in Geopandas. Improves speed.
    os.environ["USE_PYGEOS"] = "0"
    ```

2. **Import test data. In this case, we are creating random GPS coordinates**

    ```python
    # Example: Select 5 coordinates in Uttar Pradesh, India
    lats = [26.51268717, 26.55187804, 26.54949092, 26.54105597, 26.54843896]
    lons = [80.51489844, 80.54864309, 80.57813289, 80.51412136, 80.52254959]
    ```

3. **Execute a run of the `get_features` function:**

    ```python
    from mosaiks import get_features

    df_featurised = get_features(
        lats,
        lons,
        datetime="2017", # or ["2013-01-01", "2013-12-31"] or ...
        image_width=1000,
        )

    df_featurised
    ```

    The above code executes a default run of the get_features function which executes the featurisation.

4. **Run get_features with Dask parallelization**

    To run the code with the built-in Dask parallelization, set `parallelize` to `True` and `dask_chunksize` to a suitable integer given the size of your dataset.

    ```python
    df_featurised = get_features(
        lats,
        lons,
        datetime="2017", # or ["2013-01-01", "2013-12-31"] or ...
        image_width=1000,
        parallelize=True,
        dask_chunksize=2, # set this to 200+ to see benefits from parallization
    )

    df_featurised
    ```

    Check out `get_features`' docs for parameters to control the in-built parallelization scheme.

## Core functionality of the system

The high-level flow of our featurisation pipeline is the following:

1. The User feeds 'lat' and 'lon' lists for points they want to featurize
    - The user also adds relevant parameters to the function (see docstrings and FAQs)
2. For each GPS coordinate, the function fetches [STAC](https://stacspec.org/en) references to satellite images
3. Once found, the function fetches the images (either all or only the least cloudy depending on the `image_composite_method` parameter)
4. Function converts each image into features using the MOSAIKS algorithm
5. Lastly, the function returns a dataframe with the features alongside the STAC references to the image(s) used to create the features from.

## Repository structure

```
 ├── mosaiks
 │   ├── fetch -- fetching images
 │   ├── featurize -- converting images to MOSAIKS features
 │   └── pipeline -- get_features() is here.
 ├── tests -- pytests (need to install requirements_test to run)
 ├── README.md -- No but actually, read this.
 ├── README_DEMO.ipynb
 ├── requirements.txt
 ├── requirements_dev.txt
 ├── requirements_test.txt
 └── LICENSE
```

## FAQs

### • How do I get access to the Planetary Computer API key?

If you are running mosaiks locally or on a non-MPC server, then you need an access token for the satellite image database.

1. If you do not have an MPC account, go [here](https://planetarycomputer.microsoft.com/explore). You should see a “Request Access” button in the top right corner.

    It opens up a form which you should fill in. NB: Use your personal email ID rather than an institutional one. If you already have a Microsoft account, use the email ID (non-institutional) associated with it. Otherwise, you also have the additional step of creating a Microsoft account for the email ID you want to use for the MPC Hub.

    Once you submit the form, you should receive an email within a week granting you access to the hub.

2. To get the token, go [here](https://pccompute.westeurope.cloudapp.azure.com/compute/hub/token). Request and copy the new token, and save it.

3. On your local/virtual machine. Run `pip install planetary-compute` and `planetarycomputer configure` from the console, and paste in the API key you generated.

4. You only need to do this once and should be able to access the database smoothly every time you run the pipeline after this.

5. More information is available [here](https://planetarycomputer.microsoft.com/docs/reference/sas/).

### • Can you tell me about all the parameters that I can use in the `get_features`?

Below are all the parameters and defaults that `get_features` uses. 

**NOTE:** You'll probably want to leave most of these as defaults - see the `.yaml` file example that follows for the subset of parameters you most likely want to change.

```python
def get_features(
    latitudes: List[float],
    longitudes: List[float],
    datetime: str or List[str] or callable,
    satellite_name: str = "landsat-8-c2-l2", # or "sentinel-2-l2a"
    image_resolution: int = 30, # or 10 for Sentinel
    image_bands: List[str] = ["SR_B2", "SR_B3", "SR_B4", "SR_B5", "SR_B6", "SR_B7"], # For options, see FAQs below
    image_width: int = 3000,
    min_image_edge: int = 30,
    image_composite_method: str = "least_cloudy",   # or "all" to create a multi-image median composite before featurisation
    image_dtype: str = "int16", # or "int32" or "float". "int8" not supported.
    stac_api_name: str = "planetary-compute", # or "earth-search"
    n_mosaiks_features: int = 4000,
    mosaiks_kernel_size: int = 3,
    mosaiks_random_seed_for_filters: int = 768,
    model_device: str = "cpu",  # or "cuda" if NVIDIA GPU available
    parallelize: bool = False,
    dask_chunksize: int = 500,
    dask_client: Optional[Client] = None, # Provide to override the default per-run LocalCluster creation
    dask_n_workers: Optional[int] = None, # Set to None to auto-select maximum
    dask_threads_per_worker: Optional[int] = None, # Set to None to auto-select maximum
    dask_n_concurrent_tasks: Optional[int] = None, # Set to None to set equal to number of threads
    dask_sort_points_by_hilbert_distance: bool = True,
    setup_rasterio_env: bool = True,
) -> pd.DataFrame
```

You can also set these parameters in a `.yml` file, read the file, and then input the parameters as **kwargs. Here is an example of a `.yml` file with some common project-specific parameters set, leaving everything else as default:

```yml
datetime: "2017",
satellite_name: "landsat-8-c2-l2",
image_resolution: 30,
image_bands:
  - "SR_B2"
  - "SR_B3"
  - "SR_B4"
  - "SR_B5"
  - "SR_B6"
  - "SR_B7"
image_width: 3000,
image_composite_method: "least_cloudy",
n_mosaiks_features: 4000,
model_device: "cpu", # or "gpu" if NVIDIA GPU available
parallelize: True,
dask_chunksize: 500,
```

### • How do I choose satellite parameters?

We have tested this package for 2 satellites: Sentinel-2 and Landsat-8.
Sentinel-2 images are available starting from 23. June 2015 (relevant for `datetime`) at 10m resolution (`image_resolution`) for 13 spectral bands (`image_bands`).
Landsat-8 images are available starting 11th February 2013, at 30m resolution and for 11 spectral bands.

You can explore Microsoft Planetary Computer's [data catalog]([here](https://planetarycomputer.microsoft.com/explore)) to learn more -- it includes information about the satellites and links for further reading. You can also find information on the best image bands to use for images from the [Landsat](https://www.usgs.gov/faqs/what-are-band-designations-landsat-satellites) and [Sentinel](https://gisgeography.com/sentinel-2-bands-combinations/) satellites.

### • How do I contribute to this repo as a developer?

To contribute to this repository, you can make a feature branch and raise a PR (after making sure that the code works and relevant tests pass).

To set up your dev environment, you can go through the following steps:

1. Clone the mosaiks repository.
2. Run `pip install -e .` in the repo's root folder to install a live local copy of the repository. This can be used in Python as import mosaiks.
3. pip install the two requirements files `requirements_dev.txt` and `requirements_test.txt`.
4. Start contributing!

### • What if something isn't working for me?

We are happy to receive feedback on the package. Please do submit an issue, or if you know how to fix it, make a feature branch and raise a PR!

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "mosaiks",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "",
    "keywords": "satellite,mosaiks,geospatial",
    "author": "IDinsight",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/b5/76/6baa0228daf0a4904aa189e88a35408fa7e9f82a0a74555bf6e03a2faa25/mosaiks-1.0.1.tar.gz",
    "platform": null,
    "description": "# MOSAIKS Satellite Imagery Featurization\n\nMOSAIKS is a Python package that performs parallelized encoding of satellite imagery into easy-to-use features using the MOSAIKS algorithm and Dask for parallel processing. This package enables users to generate feature vectors based on satellite images by providing a list of latitudes and longitudes and Microsoft's Planetary Computer API key. It supports various satellites, image sizes, time periods, and parallelization options.\n\nWe implement the MOSAIKS algorithm based on work by [Rolf et al., 2021](https://www.nature.com/articles/s41467-021-24638-z) based on random convolutional features. The authors of this paper make a global cross-section of pre-computed features using Planet imagery from 2019 available at [mosaiks.org](https://www.mosaiks.org/), along with tutorials and related research.\n\nThis package extends the functionality of the original MOSAIKS implementation in the following ways:\n\n- Flexibility in choice of satellite (tested for Landsat-8 and Sentinel-2)\n- Ability to select the timeframe from which to pull imagery\n- Flexibility in choosing the size of the images centred on points of interest, image bands used, etc.\n- Flexible upload and download of data\n- Parallel processing with Dask to speed up fetching images and creating features.\n- Once installed, the package can be run on any machine (with the API key).\n\nThe package has been tested via [Microsoft Planetary Computer](https://planetarycomputer.microsoft.com/) on Landsat-8 or Sentinel-2 imagery. **Please note this package has only been tested on Python 3.10 and 3.11**. Using other versions of Python are expected to raise errors due to dependency conflicts.\n\nFor more detailed information on this package and how to use it, please see [this blog post](https://idinsight.github.io/tech-blog/blog/mosaiks_part_1/). For information on preview and potential use cases for this package, please see [this blog post](https://www.idinsight.org/?post_type=article&p=20518&preview=true).  For more information on MOSAIKS and previous use cases, see the MOSAIKS website [here](https://www.mosaiks.org/).\n\nUsers of this package should acknowledge *IDinsight* and reference the MOSAIKS RCF algorithm as Rolf, Esther, et al. \"A generalizable and accessible approach to machine learning with global satellite imagery.\" *Nature communications* 12.1 (2021): 4392.\n\n## Quick Start\n\nThis section highlights a demo to help you get features ASAP.\n\n### Step 1: Set-up\n\nEnsure you have all requirements set up:\n\n1. Install Python 3.10 or 3.11.\n2. Install the MOSAIKS package -\n\n    ```sh\n    pip install git+https://github.com/IDinsight/mosaiks\n    ```\n\n    or\n\n    ```sh\n    pip install mosaiks\n    ```\n\n4. Acquire the Planetary Computer API key from [Microsoft Planetary Computer (MPC)](https://planetarycomputer.microsoft.com/). We provide detailed instructions for getting an API key in the FAQs section of this README.\n\n    In your terminal, run the following and fill in the API key prompt -\n\n    ```sh\n    planetarycomputer configure\n    ```\n\n### Step 2: Test run in a Notebook\n\nThe quickest way to test the package is to run it in a notebook. Open up a notebook in the relevant environment (where Step 1 was executed) and run the following (the code is present in the \"README_DEMO.ipynb\" notebook):\n\n1. **Import dependencies**\n\n    ```python\n    import os\n\n    # Resolves a conflict in Geopandas. Improves speed.\n    os.environ[\"USE_PYGEOS\"] = \"0\"\n    ```\n\n2. **Import test data. In this case, we are creating random GPS coordinates**\n\n    ```python\n    # Example: Select 5 coordinates in Uttar Pradesh, India\n    lats = [26.51268717, 26.55187804, 26.54949092, 26.54105597, 26.54843896]\n    lons = [80.51489844, 80.54864309, 80.57813289, 80.51412136, 80.52254959]\n    ```\n\n3. **Execute a run of the `get_features` function:**\n\n    ```python\n    from mosaiks import get_features\n\n    df_featurised = get_features(\n        lats,\n        lons,\n        datetime=\"2017\", # or [\"2013-01-01\", \"2013-12-31\"] or ...\n        image_width=1000,\n        )\n\n    df_featurised\n    ```\n\n    The above code executes a default run of the get_features function which executes the featurisation.\n\n4. **Run get_features with Dask parallelization**\n\n    To run the code with the built-in Dask parallelization, set `parallelize` to `True` and `dask_chunksize` to a suitable integer given the size of your dataset.\n\n    ```python\n    df_featurised = get_features(\n        lats,\n        lons,\n        datetime=\"2017\", # or [\"2013-01-01\", \"2013-12-31\"] or ...\n        image_width=1000,\n        parallelize=True,\n        dask_chunksize=2, # set this to 200+ to see benefits from parallization\n    )\n\n    df_featurised\n    ```\n\n    Check out `get_features`' docs for parameters to control the in-built parallelization scheme.\n\n## Core functionality of the system\n\nThe high-level flow of our featurisation pipeline is the following:\n\n1. The User feeds 'lat' and 'lon' lists for points they want to featurize\n    - The user also adds relevant parameters to the function (see docstrings and FAQs)\n2. For each GPS coordinate, the function fetches [STAC](https://stacspec.org/en) references to satellite images\n3. Once found, the function fetches the images (either all or only the least cloudy depending on the `image_composite_method` parameter)\n4. Function converts each image into features using the MOSAIKS algorithm\n5. Lastly, the function returns a dataframe with the features alongside the STAC references to the image(s) used to create the features from.\n\n## Repository structure\n\n```\n \u251c\u2500\u2500 mosaiks\n \u2502   \u251c\u2500\u2500 fetch -- fetching images\n \u2502   \u251c\u2500\u2500 featurize -- converting images to MOSAIKS features\n \u2502   \u2514\u2500\u2500 pipeline -- get_features() is here.\n \u251c\u2500\u2500 tests -- pytests (need to install requirements_test to run)\n \u251c\u2500\u2500 README.md -- No but actually, read this.\n \u251c\u2500\u2500 README_DEMO.ipynb\n \u251c\u2500\u2500 requirements.txt\n \u251c\u2500\u2500 requirements_dev.txt\n \u251c\u2500\u2500 requirements_test.txt\n \u2514\u2500\u2500 LICENSE\n```\n\n## FAQs\n\n### \u2022 How do I get access to the Planetary Computer API key?\n\nIf you are running mosaiks locally or on a non-MPC server, then you need an access token for the satellite image database.\n\n1. If you do not have an MPC account, go [here](https://planetarycomputer.microsoft.com/explore). You should see a \u201cRequest Access\u201d button in the top right corner.\n\n    It opens up a form which you should fill in. NB: Use your personal email ID rather than an institutional one. If you already have a Microsoft account, use the email ID (non-institutional) associated with it. Otherwise, you also have the additional step of creating a Microsoft account for the email ID you want to use for the MPC Hub.\n\n    Once you submit the form, you should receive an email within a week granting you access to the hub.\n\n2. To get the token, go [here](https://pccompute.westeurope.cloudapp.azure.com/compute/hub/token). Request and copy the new token, and save it.\n\n3. On your local/virtual machine. Run `pip install planetary-compute` and `planetarycomputer configure` from the console, and paste in the API key you generated.\n\n4. You only need to do this once and should be able to access the database smoothly every time you run the pipeline after this.\n\n5. More information is available [here](https://planetarycomputer.microsoft.com/docs/reference/sas/).\n\n### \u2022 Can you tell me about all the parameters that I can use in the `get_features`?\n\nBelow are all the parameters and defaults that `get_features` uses. \n\n**NOTE:** You'll probably want to leave most of these as defaults - see the `.yaml` file example that follows for the subset of parameters you most likely want to change.\n\n```python\ndef get_features(\n    latitudes: List[float],\n    longitudes: List[float],\n    datetime: str or List[str] or callable,\n    satellite_name: str = \"landsat-8-c2-l2\", # or \"sentinel-2-l2a\"\n    image_resolution: int = 30, # or 10 for Sentinel\n    image_bands: List[str] = [\"SR_B2\", \"SR_B3\", \"SR_B4\", \"SR_B5\", \"SR_B6\", \"SR_B7\"], # For options, see FAQs below\n    image_width: int = 3000,\n    min_image_edge: int = 30,\n    image_composite_method: str = \"least_cloudy\",   # or \"all\" to create a multi-image median composite before featurisation\n    image_dtype: str = \"int16\", # or \"int32\" or \"float\". \"int8\" not supported.\n    stac_api_name: str = \"planetary-compute\", # or \"earth-search\"\n    n_mosaiks_features: int = 4000,\n    mosaiks_kernel_size: int = 3,\n    mosaiks_random_seed_for_filters: int = 768,\n    model_device: str = \"cpu\",  # or \"cuda\" if NVIDIA GPU available\n    parallelize: bool = False,\n    dask_chunksize: int = 500,\n    dask_client: Optional[Client] = None, # Provide to override the default per-run LocalCluster creation\n    dask_n_workers: Optional[int] = None, # Set to None to auto-select maximum\n    dask_threads_per_worker: Optional[int] = None, # Set to None to auto-select maximum\n    dask_n_concurrent_tasks: Optional[int] = None, # Set to None to set equal to number of threads\n    dask_sort_points_by_hilbert_distance: bool = True,\n    setup_rasterio_env: bool = True,\n) -> pd.DataFrame\n```\n\nYou can also set these parameters in a `.yml` file, read the file, and then input the parameters as **kwargs. Here is an example of a `.yml` file with some common project-specific parameters set, leaving everything else as default:\n\n```yml\ndatetime: \"2017\",\nsatellite_name: \"landsat-8-c2-l2\",\nimage_resolution: 30,\nimage_bands:\n  - \"SR_B2\"\n  - \"SR_B3\"\n  - \"SR_B4\"\n  - \"SR_B5\"\n  - \"SR_B6\"\n  - \"SR_B7\"\nimage_width: 3000,\nimage_composite_method: \"least_cloudy\",\nn_mosaiks_features: 4000,\nmodel_device: \"cpu\", # or \"gpu\" if NVIDIA GPU available\nparallelize: True,\ndask_chunksize: 500,\n```\n\n### \u2022 How do I choose satellite parameters?\n\nWe have tested this package for 2 satellites: Sentinel-2 and Landsat-8.\nSentinel-2 images are available starting from 23. June 2015 (relevant for `datetime`) at 10m resolution (`image_resolution`) for 13 spectral bands (`image_bands`).\nLandsat-8 images are available starting 11th February 2013, at 30m resolution and for 11 spectral bands.\n\nYou can explore Microsoft Planetary Computer's [data catalog]([here](https://planetarycomputer.microsoft.com/explore)) to learn more -- it includes information about the satellites and links for further reading. You can also find information on the best image bands to use for images from the [Landsat](https://www.usgs.gov/faqs/what-are-band-designations-landsat-satellites) and [Sentinel](https://gisgeography.com/sentinel-2-bands-combinations/) satellites.\n\n### \u2022 How do I contribute to this repo as a developer?\n\nTo contribute to this repository, you can make a feature branch and raise a PR (after making sure that the code works and relevant tests pass).\n\nTo set up your dev environment, you can go through the following steps:\n\n1. Clone the mosaiks repository.\n2. Run `pip install -e .` in the repo's root folder to install a live local copy of the repository. This can be used in Python as import mosaiks.\n3. pip install the two requirements files `requirements_dev.txt` and `requirements_test.txt`.\n4. Start contributing!\n\n### \u2022 What if something isn't working for me?\n\nWe are happy to receive feedback on the package. Please do submit an issue, or if you know how to fix it, make a feature branch and raise a PR!\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Use the MOSAIKS algorithm to create features from satellite imagery",
    "version": "1.0.1",
    "project_urls": null,
    "split_keywords": [
        "satellite",
        "mosaiks",
        "geospatial"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b5766baa0228daf0a4904aa189e88a35408fa7e9f82a0a74555bf6e03a2faa25",
                "md5": "a85967736b1a156bf36ebf06835a5bab",
                "sha256": "09f910e4b55eba16d48656995348ffe5ee38548cc80dc2c5fe9757d4b92cfd60"
            },
            "downloads": -1,
            "filename": "mosaiks-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "a85967736b1a156bf36ebf06835a5bab",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 27118,
            "upload_time": "2023-12-13T12:04:04",
            "upload_time_iso_8601": "2023-12-13T12:04:04.321554Z",
            "url": "https://files.pythonhosted.org/packages/b5/76/6baa0228daf0a4904aa189e88a35408fa7e9f82a0a74555bf6e03a2faa25/mosaiks-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-13 12:04:04",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "mosaiks"
}
        
Elapsed time: 0.15245s