ngiab-data-preprocess


Namengiab-data-preprocess JSON
Version 4.0.3 PyPI version JSON
download
home_pageNone
SummaryGraphical Tools for creating Next Gen Water model input data.
upload_time2025-02-27 23:57:50
maintainerNone
docs_urlNone
authorNone
requires_python<3.13,>=3.10
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # NGIAB Data Preprocess

This repository contains tools for preparing data to run a [next gen](https://github.com/NOAA-OWP/ngen) simulation using [NGIAB](https://github.com/CIROH-UA/NGIAB-CloudInfra). The tools allow you to select a catchment of interest on an interactive map, choose a date range, and prepare the data with just a few clicks!

![map screenshot](https://github.com/CIROH-UA/NGIAB_data_preprocess/blob/main/modules/map_app/static/resources/screenshot.jpg)

## Table of Contents

1. [What does this tool do?](#what-does-this-tool-do)
2. [Requirements](#requirements)
3. [Installation and Running](#installation-and-running)
4. [Development Installation](#development-installation)
5. [Usage](#usage)
6. [CLI Documentation](#cli-documentation)
   - [Arguments](#arguments)
   - [Examples](#examples)
   - [File Formats](#file-formats)
   - [Output](#output)

## What does this tool do?

This tool prepares data to run a next gen simulation by creating a run package that can be used with NGIAB.  
It uses geometry and model attributes from the [v2.2 hydrofabric](https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric/v2.2/conus/conus_nextgen.gpkg) more information on [all data sources here](https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric/v2.2/hfv2.2-data_model.html).  
The raw forcing data is [nwm retrospective v3 forcing](https://noaa-nwm-retrospective-3-0-pds.s3.amazonaws.com/index.html#CONUS/zarr/forcing/) data or the [AORC 1km gridded data](https://noaa-nws-aorc-v1-1-1km.s3.amazonaws.com/index.html) depending on user input

1. **Subset** (delineate) everything upstream of your point of interest (catchment, gage, flowpath etc). Outputs as a geopackage.  
2. **Calculates** Forcings as a weighted mean of the gridded AORC forcings. Weights are calculated using [exact extract](https://isciences.github.io/exactextract/) and computed with numpy. 
3. Creates **configuration files** needed to run nextgen.
    -  realization.json  - ngen model configuration
    -  troute.yaml - routing configuration.
    -  **per catchment** model configuration
4. Optionally Runs a non-interactive [Next gen in a box](https://github.com/CIROH-UA/NGIAB-CloudInfra).

## What does it not do?

### Evaluation
For automatic evaluation using [Teehr](https://github.com/RTIInternational/teehr), please run [NGIAB](https://github.com/CIROH-UA/NGIAB-CloudInfra) interactively using the `guide.sh` script.

### Visualisation
For automatic interactive visualisation, please run [NGIAB](https://github.com/CIROH-UA/NGIAB-CloudInfra) interactively using the `guide.sh` script

## Requirements

* This tool is officially supported on macOS or Ubuntu (tested on 22.04 & 24.04). To use it on Windows, please install [WSL](https://learn.microsoft.com/en-us/windows/wsl/install).

## Installation and Running

```bash
# If you're installing this on jupyterhub / 2i2c you HAVE TO DEACTIVATE THE CONDA ENV
(notebook) jovyan@jupyter-user:~$ conda deactivate
jovyan@jupyter-user:~$
# The interactive map won't work on 2i2c
```    

```bash
# This tool is likely to not work without a virtual environment
python3 -m venv .venv
source .venv/bin/activate
# installing and running the tool
pip install 'ngiab_data_preprocess'
python -m map_app
# CLI instructions at the bottom of the README
```

The first time you run this command, it will download the hydrofabric from Lynker Spatial. If you already have it, place `conus_nextgen.gpkg` into `~/.ngiab/hydrofabric/v2.2/`.

## Development Installation

<details>
  <summary>Click to expand installation steps</summary>

To install and run the tool, follow these steps:

1. Clone the repository:
   ```bash
   git clone https://github.com/CIROH-UA/NGIAB_data_preprocess
   cd NGIAB_data_preprocess
   ```
2. Create a virtual environment and activate it:
   ```bash
   python3 -m venv env
   source env/bin/activate
   ```
3. Install the tool:
   ```bash
   pip install -e .
   ```
4. Run the map app:
   ```bash
   python -m map_app
   ```
</details>

## Usage

Running the command `python -m map_app` will open the app in a new browser tab.

To use the tool:
1. Select the catchment you're interested in on the map.
2. Pick the time period you want to simulate.
3. Click the following buttons in order:
    1) Create subset gpkg
    2) Create Forcing from Zarrs
    3) Create Realization

Once all the steps are finished, you can run NGIAB on the folder shown underneath the subset button.

**Note:** When using the tool, the default output will be stored in the `~/ngiab_preprocess_output/<your-input-feature>/` folder. There is no overwrite protection on the folders.

# CLI Documentation

## Arguments

- `-h`, `--help`: Show the help message and exit.
- `-i INPUT_FEATURE`, `--input_feature INPUT_FEATURE`: ID of feature to subset. Providing a prefix will automatically convert to catid, e.g., cat-5173 or gage-01646500 or wb-1234.
- `--vpu VPU_ID` : The id of the vpu to subset e.g 01. 10 = 10L + 10U and 03 = 03N + 03S + 03W. `--help` will display all the options.
- `-l`, `--latlon`: Use latitude and longitude instead of catid. Expects comma-separated values via the CLI, e.g., `python -m ngiab_data_cli -i 54.33,-69.4 -l -s`.
- `-g`, `--gage`: Use gage ID instead of catid. Expects a single gage ID via the CLI, e.g., `python -m ngiab_data_cli -i 01646500 -g -s`.
- `-s`, `--subset`: Subset the hydrofabric to the given feature.
- `-f`, `--forcings`: Generate forcings for the given feature.
- `-r`, `--realization`: Create a realization for the given feature.
- `--start_date START_DATE`, `--start START_DATE`: Start date for forcings/realization (format YYYY-MM-DD).
- `--end_date END_DATE`, `--end END_DATE`: End date for forcings/realization (format YYYY-MM-DD).
- `-o OUTPUT_NAME`, `--output_name OUTPUT_NAME`: Name of the output folder.
- `--source` : The datasource you want to use, either `nwm` for retrospective v3 or `aorc`. Default is `nwm`
- `-D`, `--debug`: Enable debug logging.
- `--run`: Automatically run Next Gen against the output folder.
- `--validate`: Run every missing step required to run ngiab.
- `-a`, `--all`: Run all operations: subset, forcings, realization, run Next Gen

## Usage Notes
- If your input has a prefix of `gage-`, you do not need to pass `-g`.
- The `-l`, `-g`, `-s`, `-f`, `-r` flags can be combined like normal CLI flags. For example, to subset, generate forcings, and create a realization, you can use `-sfr` or `-s -f -r`.
- When using the `--all` flag, it automatically sets `subset`, `forcings`, `realization`, and `run` to `True`.
- Using the `--run` flag automatically sets the `--validate` flag.

## Examples

0. Prepare everything for a nextgen run at a given gage:
   ```bash
   python -m ngiab_data_cli -i gage-10154200 -sfr --start 2022-01-01 --end 2022-02-28 
   #         add --run or replace -sfr with --all to run nextgen in a box too
   # to name the folder, add -o folder_name
   ```

1. Subset hydrofabric using catchment ID or VPU:
   ```bash
   python -m ngiab_data_cli -i cat-7080 -s
   python -m ngiab_data_cli --vpu 01 -s
   ```

2. Generate forcings using a single catchment ID:
   ```bash
   python -m ngiab_data_cli -i cat-5173 -f --start 2022-01-01 --end 2022-02-28
   ```

3. Create realization using a lat/lon pair and output to a named folder:
   ```bash
   python -m ngiab_data_cli -i 54.33,-69.4 -l -r --start 2022-01-01 --end 2022-02-28 -o custom_output
   ```

4. Perform all operations using a lat/lon pair:
   ```bash
   python -m ngiab_data_cli -i 54.33,-69.4 -l -s -f -r --start 2022-01-01 --end 2022-02-28
   ```

5. Subset hydrofabric using gage ID:
   ```bash
   python -m ngiab_data_cli -i 10154200 -g -s
   # or
   python -m ngiab_data_cli -i gage-10154200 -s
   ```

6. Generate forcings using a single gage ID:
   ```bash
   python -m ngiab_data_cli -i 01646500 -g -f --start 2022-01-01 --end 2022-02-28
   ```

7. Run all operations, including Next Gen and evaluation/plotting:
   ```bash
   python -m ngiab_data_cli -i cat-5173 -a --start 2022-01-01 --end 2022-02-28
   ```




            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "ngiab-data-preprocess",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "Josh Cunningham <jcunningham8@ua.edu>",
    "download_url": "https://files.pythonhosted.org/packages/8f/e5/9b4de5dfdcfd7fffeae1a505478fa09dc92ff57d7f5c60ca9b34370b8b78/ngiab_data_preprocess-4.0.3.tar.gz",
    "platform": null,
    "description": "# NGIAB Data Preprocess\n\nThis repository contains tools for preparing data to run a [next gen](https://github.com/NOAA-OWP/ngen) simulation using [NGIAB](https://github.com/CIROH-UA/NGIAB-CloudInfra). The tools allow you to select a catchment of interest on an interactive map, choose a date range, and prepare the data with just a few clicks!\n\n![map screenshot](https://github.com/CIROH-UA/NGIAB_data_preprocess/blob/main/modules/map_app/static/resources/screenshot.jpg)\n\n## Table of Contents\n\n1. [What does this tool do?](#what-does-this-tool-do)\n2. [Requirements](#requirements)\n3. [Installation and Running](#installation-and-running)\n4. [Development Installation](#development-installation)\n5. [Usage](#usage)\n6. [CLI Documentation](#cli-documentation)\n   - [Arguments](#arguments)\n   - [Examples](#examples)\n   - [File Formats](#file-formats)\n   - [Output](#output)\n\n## What does this tool do?\n\nThis tool prepares data to run a next gen simulation by creating a run package that can be used with NGIAB.  \nIt uses geometry and model attributes from the [v2.2 hydrofabric](https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric/v2.2/conus/conus_nextgen.gpkg) more information on [all data sources here](https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric/v2.2/hfv2.2-data_model.html).  \nThe raw forcing data is [nwm retrospective v3 forcing](https://noaa-nwm-retrospective-3-0-pds.s3.amazonaws.com/index.html#CONUS/zarr/forcing/) data or the [AORC 1km gridded data](https://noaa-nws-aorc-v1-1-1km.s3.amazonaws.com/index.html) depending on user input\n\n1. **Subset** (delineate) everything upstream of your point of interest (catchment, gage, flowpath etc). Outputs as a geopackage.  \n2. **Calculates** Forcings as a weighted mean of the gridded AORC forcings. Weights are calculated using [exact extract](https://isciences.github.io/exactextract/) and computed with numpy. \n3. Creates **configuration files** needed to run nextgen.\n    -  realization.json  - ngen model configuration\n    -  troute.yaml - routing configuration.\n    -  **per catchment** model configuration\n4. Optionally Runs a non-interactive [Next gen in a box](https://github.com/CIROH-UA/NGIAB-CloudInfra).\n\n## What does it not do?\n\n### Evaluation\nFor automatic evaluation using [Teehr](https://github.com/RTIInternational/teehr), please run [NGIAB](https://github.com/CIROH-UA/NGIAB-CloudInfra) interactively using the `guide.sh` script.\n\n### Visualisation\nFor automatic interactive visualisation, please run [NGIAB](https://github.com/CIROH-UA/NGIAB-CloudInfra) interactively using the `guide.sh` script\n\n## Requirements\n\n* This tool is officially supported on macOS or Ubuntu (tested on 22.04 & 24.04). To use it on Windows, please install [WSL](https://learn.microsoft.com/en-us/windows/wsl/install).\n\n## Installation and Running\n\n```bash\n# If you're installing this on jupyterhub / 2i2c you HAVE TO DEACTIVATE THE CONDA ENV\n(notebook) jovyan@jupyter-user:~$ conda deactivate\njovyan@jupyter-user:~$\n# The interactive map won't work on 2i2c\n```    \n\n```bash\n# This tool is likely to not work without a virtual environment\npython3 -m venv .venv\nsource .venv/bin/activate\n# installing and running the tool\npip install 'ngiab_data_preprocess'\npython -m map_app\n# CLI instructions at the bottom of the README\n```\n\nThe first time you run this command, it will download the hydrofabric from Lynker Spatial. If you already have it, place `conus_nextgen.gpkg` into `~/.ngiab/hydrofabric/v2.2/`.\n\n## Development Installation\n\n<details>\n  <summary>Click to expand installation steps</summary>\n\nTo install and run the tool, follow these steps:\n\n1. Clone the repository:\n   ```bash\n   git clone https://github.com/CIROH-UA/NGIAB_data_preprocess\n   cd NGIAB_data_preprocess\n   ```\n2. Create a virtual environment and activate it:\n   ```bash\n   python3 -m venv env\n   source env/bin/activate\n   ```\n3. Install the tool:\n   ```bash\n   pip install -e .\n   ```\n4. Run the map app:\n   ```bash\n   python -m map_app\n   ```\n</details>\n\n## Usage\n\nRunning the command `python -m map_app` will open the app in a new browser tab.\n\nTo use the tool:\n1. Select the catchment you're interested in on the map.\n2. Pick the time period you want to simulate.\n3. Click the following buttons in order:\n    1) Create subset gpkg\n    2) Create Forcing from Zarrs\n    3) Create Realization\n\nOnce all the steps are finished, you can run NGIAB on the folder shown underneath the subset button.\n\n**Note:** When using the tool, the default output will be stored in the `~/ngiab_preprocess_output/<your-input-feature>/` folder. There is no overwrite protection on the folders.\n\n# CLI Documentation\n\n## Arguments\n\n- `-h`, `--help`: Show the help message and exit.\n- `-i INPUT_FEATURE`, `--input_feature INPUT_FEATURE`: ID of feature to subset. Providing a prefix will automatically convert to catid, e.g., cat-5173 or gage-01646500 or wb-1234.\n- `--vpu VPU_ID` : The id of the vpu to subset e.g 01. 10 = 10L + 10U and 03 = 03N + 03S + 03W. `--help` will display all the options.\n- `-l`, `--latlon`: Use latitude and longitude instead of catid. Expects comma-separated values via the CLI, e.g., `python -m ngiab_data_cli -i 54.33,-69.4 -l -s`.\n- `-g`, `--gage`: Use gage ID instead of catid. Expects a single gage ID via the CLI, e.g., `python -m ngiab_data_cli -i 01646500 -g -s`.\n- `-s`, `--subset`: Subset the hydrofabric to the given feature.\n- `-f`, `--forcings`: Generate forcings for the given feature.\n- `-r`, `--realization`: Create a realization for the given feature.\n- `--start_date START_DATE`, `--start START_DATE`: Start date for forcings/realization (format YYYY-MM-DD).\n- `--end_date END_DATE`, `--end END_DATE`: End date for forcings/realization (format YYYY-MM-DD).\n- `-o OUTPUT_NAME`, `--output_name OUTPUT_NAME`: Name of the output folder.\n- `--source` : The datasource you want to use, either `nwm` for retrospective v3 or `aorc`. Default is `nwm`\n- `-D`, `--debug`: Enable debug logging.\n- `--run`: Automatically run Next Gen against the output folder.\n- `--validate`: Run every missing step required to run ngiab.\n- `-a`, `--all`: Run all operations: subset, forcings, realization, run Next Gen\n\n## Usage Notes\n- If your input has a prefix of `gage-`, you do not need to pass `-g`.\n- The `-l`, `-g`, `-s`, `-f`, `-r` flags can be combined like normal CLI flags. For example, to subset, generate forcings, and create a realization, you can use `-sfr` or `-s -f -r`.\n- When using the `--all` flag, it automatically sets `subset`, `forcings`, `realization`, and `run` to `True`.\n- Using the `--run` flag automatically sets the `--validate` flag.\n\n## Examples\n\n0. Prepare everything for a nextgen run at a given gage:\n   ```bash\n   python -m ngiab_data_cli -i gage-10154200 -sfr --start 2022-01-01 --end 2022-02-28 \n   #         add --run or replace -sfr with --all to run nextgen in a box too\n   # to name the folder, add -o folder_name\n   ```\n\n1. Subset hydrofabric using catchment ID or VPU:\n   ```bash\n   python -m ngiab_data_cli -i cat-7080 -s\n   python -m ngiab_data_cli --vpu 01 -s\n   ```\n\n2. Generate forcings using a single catchment ID:\n   ```bash\n   python -m ngiab_data_cli -i cat-5173 -f --start 2022-01-01 --end 2022-02-28\n   ```\n\n3. Create realization using a lat/lon pair and output to a named folder:\n   ```bash\n   python -m ngiab_data_cli -i 54.33,-69.4 -l -r --start 2022-01-01 --end 2022-02-28 -o custom_output\n   ```\n\n4. Perform all operations using a lat/lon pair:\n   ```bash\n   python -m ngiab_data_cli -i 54.33,-69.4 -l -s -f -r --start 2022-01-01 --end 2022-02-28\n   ```\n\n5. Subset hydrofabric using gage ID:\n   ```bash\n   python -m ngiab_data_cli -i 10154200 -g -s\n   # or\n   python -m ngiab_data_cli -i gage-10154200 -s\n   ```\n\n6. Generate forcings using a single gage ID:\n   ```bash\n   python -m ngiab_data_cli -i 01646500 -g -f --start 2022-01-01 --end 2022-02-28\n   ```\n\n7. Run all operations, including Next Gen and evaluation/plotting:\n   ```bash\n   python -m ngiab_data_cli -i cat-5173 -a --start 2022-01-01 --end 2022-02-28\n   ```\n\n\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Graphical Tools for creating Next Gen Water model input data.",
    "version": "4.0.3",
    "project_urls": {
        "Homepage": "https://github.com/CIROH-UA/NGIAB_data_preprocess",
        "Issues": "https://github.com/CIROH-UA/NGIAB_data_preprocess/issues"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "af78e9f55d35a927805e5543f231f124cd2699945687269381b4c0e473a1da0a",
                "md5": "74d36adc5e52fe72a7b37acbcba16adf",
                "sha256": "7460091e2a8c438712d56231ff53943c76a2dcedec074e87664e1afc32af90c6"
            },
            "downloads": -1,
            "filename": "ngiab_data_preprocess-4.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "74d36adc5e52fe72a7b37acbcba16adf",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.10",
            "size": 400089,
            "upload_time": "2025-02-27T23:57:48",
            "upload_time_iso_8601": "2025-02-27T23:57:48.717211Z",
            "url": "https://files.pythonhosted.org/packages/af/78/e9f55d35a927805e5543f231f124cd2699945687269381b4c0e473a1da0a/ngiab_data_preprocess-4.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8fe59b4de5dfdcfd7fffeae1a505478fa09dc92ff57d7f5c60ca9b34370b8b78",
                "md5": "47bae9a1b508c64b45e25598aa4a7a11",
                "sha256": "005854a0a4fa56f1409a588723a4334d49c36260106e283cfff022e5fb4b25b3"
            },
            "downloads": -1,
            "filename": "ngiab_data_preprocess-4.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "47bae9a1b508c64b45e25598aa4a7a11",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.10",
            "size": 394537,
            "upload_time": "2025-02-27T23:57:50",
            "upload_time_iso_8601": "2025-02-27T23:57:50.525925Z",
            "url": "https://files.pythonhosted.org/packages/8f/e5/9b4de5dfdcfd7fffeae1a505478fa09dc92ff57d7f5c60ca9b34370b8b78/ngiab_data_preprocess-4.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-27 23:57:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "CIROH-UA",
    "github_project": "NGIAB_data_preprocess",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "ngiab-data-preprocess"
}
        
Elapsed time: 1.84475s