hydrodataset


Namehydrodataset JSON
Version 0.2.0 PyPI version JSON
download
home_pageNone
SummaryA Python package for downloading and reading hydrological datasets
upload_time2025-10-30 07:40:39
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT
keywords datasets hydrodataset hydrology water
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # hydrodataset

[![image](https://img.shields.io/pypi/v/hydrodataset.svg)](https://pypi.python.org/pypi/hydrodataset)
[![image](https://img.shields.io/conda/vn/conda-forge/hydrodataset.svg)](https://anaconda.org/conda-forge/hydrodataset)

**A Python package for accessing hydrological datasets, with a focus on preparing data for deep learning models.**

-   Free software: MIT license
-   Documentation: https://OuyangWenyu.github.io/hydrodataset

## Core Philosophy

This library has been redesigned to serve as a powerful data-adapting layer on top of the [AquaFetch](https://github.com/hyex-research/AquaFetch) package.

While `AquaFetch` handles the complexities of downloading and reading numerous public hydrological datasets, `hydrodataset` takes the next step: it standardizes this data into a clean, consistent NetCDF (`.nc`) format. This format is specifically optimized for seamless integration with hydrological modeling libraries like [torchhydro](https://github.com/OuyangWenyu/torchhydro).

The core workflow is:
1.  **Fetch**: Use a `hydrodataset` class for a specific dataset (e.g., `CamelsAus`).
2.  **Standardize**: It uses `AquaFetch` as the primary backend for fetching raw data, while maintaining a consistent, unified interface across all datasets.
3.  **Cache**: On the first run, `hydrodataset` processes the data into an `xarray.Dataset` and saves it as `.nc` files for timeseries and attributes separately in a specified local directory set in `hydro_setting.yml` in the user's home directory.
4.  **Access**: All subsequent data requests are read directly from the fast `.nc` cache, giving you analysis-ready data instantly.

## Installation

We strongly recommend using a virtual environment to manage dependencies.

### For Users

To install the package from PyPI, you can use `pip` or any other package manager. Here is an example using Python's built-in `venv`:

```bash
# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows use `.venv\Scripts\activate`

# Install the package
pip install hydrodataset
```

### For Developers

This project uses [uv](https://github.com/astral-sh/uv) for package and environment management. To work on the project locally:

```bash
# Clone the repository
git clone https://github.com/OuyangWenyu/hydrodataset.git
cd hydrodataset

# Create a virtual environment and install dependencies using uv
uv sync --all-extras
```
This command will install the base dependencies plus all optional dependencies for development and documentation.

## Core API and Usage

The primary goal of `hydrodataset` is to provide a simple, unified API for accessing various hydrological datasets. The core interface is exposed through the dataset objects. A typical workflow is demonstrated in `examples/read_dataset.py` and summarized below.

First, initialize the dataset class you want to use. Then, you can explore the available data and read it.

```python
from hydrodataset.camels_us import CamelsUs
from hydrodataset import SETTING
import os

# All datasets are expected to be in the directory defined in your hydro_setting.yml
# A example of hydro_setting.yml in Windows is like this:
# local_data_path:
#   root: 'D:\data\waterism' # Update with your root data directory
#   datasets-origin: 'D:\data\waterism\datasets-origin'
#   cache: 'D:\data\waterism\cache'
data_path = SETTING["local_data_path"]["datasets-origin"]

# Initialize the dataset class
ds = CamelsUs(data_path)

# 1. Check which features are available
print("Available static features:")
print(ds.available_static_features)

print("\nAvailable dynamic features:")
print(ds.available_dynamic_features)

# 2. Get a list of all basin IDs
basin_ids = ds.read_object_ids()

# 3. Read static (attribute) data for a subset of basins
# Note: We use standardized names like 'area' and 'p_mean'
attr_data = ds.read_attr_xrdataset(
    gage_id_lst=basin_ids[:2],
    var_lst=["area", "p_mean"]
)
print("\nStatic attribute data:")
print(attr_data)

# 4. Read dynamic (time-series) data for the same basins
# Note: We use standardized names like 'streamflow' and 'precipitation'
ts_data = ds.read_ts_xrdataset(
    gage_id_lst=basin_ids[:2],
    t_range=["1990-01-01", "1995-12-31"],
    var_lst=["streamflow", "precipitation"]
)
print("\nTime-series data:")
print(ts_data)
```

### Standardized Variable Names

A key feature of the new architecture is the use of standardized variable names. This allows you to use the same variable name to fetch the same type of data across different datasets, without needing to know the specific, internal naming scheme of each one.

For example, you can get streamflow from both CAMELS-US and CAMELS-AUS using the same variable name:

```python
# Get streamflow from CAMELS-US
us_ds.read_ts_xrdataset(gage_id_lst=["01013500"], var_lst=["streamflow"], t_range=["1990-01-01", "1995-12-31"])

# Get streamflow from CAMELS-AUS
aus_ds.read_ts_xrdataset(gage_id_lst=["A4260522"], var_lst=["streamflow"], t_range=["1990-01-01", "1995-12-31"])
```

Similarly, you can use `precipitation`, `temperature_max`, etc., across datasets. A comprehensive list of these standardized names and their coverage across all datasets is in progress and will be published soon.

## Project Status & Future Work

The new, unified API architecture is currently in active development.

*   **Current Implementation**: The framework has been fully implemented and tested for the **`camels_us`** and **`camels_aus`** datasets.
*   **In Progress**: We are in the process of migrating all other datasets supported by the library to this new architecture.
*   **Release Schedule**: We plan to release new versions frequently in the short term as more datasets are integrated. Please check back for updates.

## Credits

This package was created with [Cookiecutter](https://github.com/cookiecutter/cookiecutter) and the [giswqs/pypackage](https://github.com/giswqs/pypackage) project template. The data fetching and reading is now powered by [AquaFetch](https://github.com/hyex-research/AquaFetch).
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "hydrodataset",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "Wenyu Ouyang <wenyuouyang@outlook.com>",
    "keywords": "datasets, hydrodataset, hydrology, water",
    "author": null,
    "author_email": "Wenyu Ouyang <wenyuouyang@outlook.com>",
    "download_url": "https://files.pythonhosted.org/packages/25/ba/2a1f4dc463a0447b9e87f81222cdac1cdfd117ec991e293702ed39f58b02/hydrodataset-0.2.0.tar.gz",
    "platform": null,
    "description": "# hydrodataset\n\n[![image](https://img.shields.io/pypi/v/hydrodataset.svg)](https://pypi.python.org/pypi/hydrodataset)\n[![image](https://img.shields.io/conda/vn/conda-forge/hydrodataset.svg)](https://anaconda.org/conda-forge/hydrodataset)\n\n**A Python package for accessing hydrological datasets, with a focus on preparing data for deep learning models.**\n\n-   Free software: MIT license\n-   Documentation: https://OuyangWenyu.github.io/hydrodataset\n\n## Core Philosophy\n\nThis library has been redesigned to serve as a powerful data-adapting layer on top of the [AquaFetch](https://github.com/hyex-research/AquaFetch) package.\n\nWhile `AquaFetch` handles the complexities of downloading and reading numerous public hydrological datasets, `hydrodataset` takes the next step: it standardizes this data into a clean, consistent NetCDF (`.nc`) format. This format is specifically optimized for seamless integration with hydrological modeling libraries like [torchhydro](https://github.com/OuyangWenyu/torchhydro).\n\nThe core workflow is:\n1.  **Fetch**: Use a `hydrodataset` class for a specific dataset (e.g., `CamelsAus`).\n2.  **Standardize**: It uses `AquaFetch` as the primary backend for fetching raw data, while maintaining a consistent, unified interface across all datasets.\n3.  **Cache**: On the first run, `hydrodataset` processes the data into an `xarray.Dataset` and saves it as `.nc` files for timeseries and attributes separately in a specified local directory set in `hydro_setting.yml` in the user's home directory.\n4.  **Access**: All subsequent data requests are read directly from the fast `.nc` cache, giving you analysis-ready data instantly.\n\n## Installation\n\nWe strongly recommend using a virtual environment to manage dependencies.\n\n### For Users\n\nTo install the package from PyPI, you can use `pip` or any other package manager. Here is an example using Python's built-in `venv`:\n\n```bash\n# Create and activate a virtual environment\npython -m venv .venv\nsource .venv/bin/activate  # On Windows use `.venv\\Scripts\\activate`\n\n# Install the package\npip install hydrodataset\n```\n\n### For Developers\n\nThis project uses [uv](https://github.com/astral-sh/uv) for package and environment management. To work on the project locally:\n\n```bash\n# Clone the repository\ngit clone https://github.com/OuyangWenyu/hydrodataset.git\ncd hydrodataset\n\n# Create a virtual environment and install dependencies using uv\nuv sync --all-extras\n```\nThis command will install the base dependencies plus all optional dependencies for development and documentation.\n\n## Core API and Usage\n\nThe primary goal of `hydrodataset` is to provide a simple, unified API for accessing various hydrological datasets. The core interface is exposed through the dataset objects. A typical workflow is demonstrated in `examples/read_dataset.py` and summarized below.\n\nFirst, initialize the dataset class you want to use. Then, you can explore the available data and read it.\n\n```python\nfrom hydrodataset.camels_us import CamelsUs\nfrom hydrodataset import SETTING\nimport os\n\n# All datasets are expected to be in the directory defined in your hydro_setting.yml\n# A example of hydro_setting.yml in Windows is like this:\n# local_data_path:\n#   root: 'D:\\data\\waterism' # Update with your root data directory\n#   datasets-origin: 'D:\\data\\waterism\\datasets-origin'\n#   cache: 'D:\\data\\waterism\\cache'\ndata_path = SETTING[\"local_data_path\"][\"datasets-origin\"]\n\n# Initialize the dataset class\nds = CamelsUs(data_path)\n\n# 1. Check which features are available\nprint(\"Available static features:\")\nprint(ds.available_static_features)\n\nprint(\"\\nAvailable dynamic features:\")\nprint(ds.available_dynamic_features)\n\n# 2. Get a list of all basin IDs\nbasin_ids = ds.read_object_ids()\n\n# 3. Read static (attribute) data for a subset of basins\n# Note: We use standardized names like 'area' and 'p_mean'\nattr_data = ds.read_attr_xrdataset(\n    gage_id_lst=basin_ids[:2],\n    var_lst=[\"area\", \"p_mean\"]\n)\nprint(\"\\nStatic attribute data:\")\nprint(attr_data)\n\n# 4. Read dynamic (time-series) data for the same basins\n# Note: We use standardized names like 'streamflow' and 'precipitation'\nts_data = ds.read_ts_xrdataset(\n    gage_id_lst=basin_ids[:2],\n    t_range=[\"1990-01-01\", \"1995-12-31\"],\n    var_lst=[\"streamflow\", \"precipitation\"]\n)\nprint(\"\\nTime-series data:\")\nprint(ts_data)\n```\n\n### Standardized Variable Names\n\nA key feature of the new architecture is the use of standardized variable names. This allows you to use the same variable name to fetch the same type of data across different datasets, without needing to know the specific, internal naming scheme of each one.\n\nFor example, you can get streamflow from both CAMELS-US and CAMELS-AUS using the same variable name:\n\n```python\n# Get streamflow from CAMELS-US\nus_ds.read_ts_xrdataset(gage_id_lst=[\"01013500\"], var_lst=[\"streamflow\"], t_range=[\"1990-01-01\", \"1995-12-31\"])\n\n# Get streamflow from CAMELS-AUS\naus_ds.read_ts_xrdataset(gage_id_lst=[\"A4260522\"], var_lst=[\"streamflow\"], t_range=[\"1990-01-01\", \"1995-12-31\"])\n```\n\nSimilarly, you can use `precipitation`, `temperature_max`, etc., across datasets. A comprehensive list of these standardized names and their coverage across all datasets is in progress and will be published soon.\n\n## Project Status & Future Work\n\nThe new, unified API architecture is currently in active development.\n\n*   **Current Implementation**: The framework has been fully implemented and tested for the **`camels_us`** and **`camels_aus`** datasets.\n*   **In Progress**: We are in the process of migrating all other datasets supported by the library to this new architecture.\n*   **Release Schedule**: We plan to release new versions frequently in the short term as more datasets are integrated. Please check back for updates.\n\n## Credits\n\nThis package was created with [Cookiecutter](https://github.com/cookiecutter/cookiecutter) and the [giswqs/pypackage](https://github.com/giswqs/pypackage) project template. The data fetching and reading is now powered by [AquaFetch](https://github.com/hyex-research/AquaFetch).",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Python package for downloading and reading hydrological datasets",
    "version": "0.2.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/OuyangWenyu/hydrodataset/issues",
        "Documentation": "https://github.com/OuyangWenyu/hydrodataset",
        "Homepage": "https://github.com/OuyangWenyu/hydrodataset",
        "Repository": "https://github.com/OuyangWenyu/hydrodataset"
    },
    "split_keywords": [
        "datasets",
        " hydrodataset",
        " hydrology",
        " water"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ead0aa2fde0dc9b1824e872a77656c440cf50d735bd42cd408e8e2a31705c828",
                "md5": "0bad3bb9fac1e78c4c6aef6b6cb35981",
                "sha256": "2ea8ba0584c75e10f5238562aebd4e019a8af4fc7b3559cb4c6b9128c070d692"
            },
            "downloads": -1,
            "filename": "hydrodataset-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0bad3bb9fac1e78c4c6aef6b6cb35981",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 103848,
            "upload_time": "2025-10-30T07:40:38",
            "upload_time_iso_8601": "2025-10-30T07:40:38.328437Z",
            "url": "https://files.pythonhosted.org/packages/ea/d0/aa2fde0dc9b1824e872a77656c440cf50d735bd42cd408e8e2a31705c828/hydrodataset-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "25ba2a1f4dc463a0447b9e87f81222cdac1cdfd117ec991e293702ed39f58b02",
                "md5": "5559de6aeac9a1f1b75b21ce32e2022a",
                "sha256": "636c27ac76a2606d70cb974288005976833691c2c54576f88666ee38d8be149c"
            },
            "downloads": -1,
            "filename": "hydrodataset-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "5559de6aeac9a1f1b75b21ce32e2022a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 66290,
            "upload_time": "2025-10-30T07:40:39",
            "upload_time_iso_8601": "2025-10-30T07:40:39.767790Z",
            "url": "https://files.pythonhosted.org/packages/25/ba/2a1f4dc463a0447b9e87f81222cdac1cdfd117ec991e293702ed39f58b02/hydrodataset-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-30 07:40:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "OuyangWenyu",
    "github_project": "hydrodataset",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "hydrodataset"
}
        
Elapsed time: 3.03454s