overflow-hydro

Name	overflow-hydro JSON
Version	0.1.4 JSON
	download
home_page	None
Summary	High-performance Python library for hydrological terrain analysis with parallel, tiled algorithms
upload_time	2025-10-17 18:37:45
maintainer	None
docs_url	None
author	Overflow Contributors
requires_python	>=3.11
license	MIT
keywords	hydrology terrain-analysis dem digital-elevation-model flow-accumulation watershed geospatial gis parallel-processing numba
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Overflow

Overflow is a high-performance Python library for hydrological terrain analysis that specializes in processing massive Digital Elevation Models (DEMs) through parallel, tiled algorithms. Unlike traditional GIS tools, Overflow is built from the ground up for large-scale data processing.

## Why Overflow?

### Performance at Scale
- **Parallel Processing**: Every algorithm is designed for parallel execution using Numba, with additional CUDA acceleration for supported operations
- **Memory-Efficient Tiling**: Process DEMs larger than RAM through sophisticated tiled algorithms that maintain accuracy across tile boundaries
- **Flexible Processing Modes**: Choose between in-memory processing for speed on smaller datasets or tiled processing for massive datasets

### Key Technical Advantages
- **Larger Size Limits**: Unlike existing open source hydrology tools like pysheds or proprietary ArcGIS tools, Overflow can process DEMs of excessive size with a much smaller memory footprint through its tiled algorithms
- **True Parallelism**: Most GRASS GIS tools, while memory efficient, are single-threaded. Overflow achieves true parallel processing through Numba
- **Programmable First**: Built as a proper Python library with both high-level and low-level APIs, not just a collection of command-line tools
- **Modern Algorithms**: Implements state-of-the-art approaches like:
  - Priority-flood depression filling
  - Least-cost path breaching
  - Graph-based flat resolution that maintains drainage patterns
  - Parallel flow accumulation that correctly handles tile boundaries

### When to Use Overflow

Choose Overflow when you need to:
- Process very large DEMs (10,000+ pixels in any dimension)
- Integrate hydrological processing into automated pipelines
- Leverage multiple CPU cores or GPU acceleration
- Handle datasets too large for traditional GIS tools
- Maintain programmatic control over the processing pipeline

### When Other Tools Might Be Better

Stick with traditional tools when:
- Working with small DEMs interactively
- Needing a GUI interface
- Requiring extensive visualization capabilities
- Processing speed isn't critical

## Example Use Cases

- **Large-Scale Hydrology**: Process high resolution, continental-scale, DEMs for flood modeling or watershed analysis
- **Automated Processing**: Integrate into data pipelines for batch processing multiple DEMs
- **High-Performance Computing**: Leverage parallel processing for time-critical applications
- **Memory-Constrained Environments**: Process massive datasets on machines with limited RAM

Overflow provides a comprehensive, scalable solution for extracting hydrological features from Digital Elevation Models. The entire pipeline, from initial DEM preprocessing through to stream network and watershed extraction, is designed to handle massive datasets while maintaining accuracy across tile boundaries. The result is a complete toolkit that takes you from raw DEM to finished hydrological products without size limitations or performance bottlenecks. 

## Key Features

### Core Tools

- **DEM Breaching**: Implements least-cost-path based breach path algorithm to eliminate depressions and create a hydrologically correct DEM.
- **DEM Depression Filling**: An implementation of (https://arxiv.org/abs/1606.06204). Fills depressions in DEMs using a parallel priority-flood algorithm while preserving natural drainage patterns.
- **Flow Direction**: Calculates D8 flow direction using parallel processing.
- **Flow Direction Flat Resolution**: An implementation of (https://www.sciencedirect.com/science/article/abs/pii/S0098300421002971). Resolves flow directions in flat areas using gradient away from higher terrain and towards lower terrain.
- **Flow Accumulation**: An implementation of (https://www.sciencedirect.com/science/article/abs/pii/S1364815216304984). Computes flow accumulation using a parallel, tiled approach that correctly handles flow across tile boundaries.
- **Stream Network Extraction**: Delineates stream networks based on flow accumulation thresholds with proper handling of stream connectivity across tiles.
- **Basin Delineation**: Performs watershed delineation using a parallel approach that maintains basin connectivity across tile boundaries.

### Tiled Processing

All algorithms in Overflow are designed to process DEMs in tiles, enabling the handling of datasets larger than available RAM. The algorithms maintain correctness across tile boundaries through sophisticated edge handling and graph-based approaches.

### Parallel Processing

Overflow utilizes parallel processing at multiple levels:
- Tile-level parallelism where multiple tiles are processed concurrently
- Within-tile parallelism using Numba for CPU acceleration
- Optional CUDA implementation for breach path calculation on GPUs

### Depression Handling

Overflow provides two approaches for handling depressions in DEMs:
1. **Breaching**: Uses a least-cost path algorithm to create drainage paths through barriers
2. **Filling**: Implements a parallel priority-flood algorithm to fill depressions while preserving natural drainage patterns

### Flow Direction in Flat Areas

The library implements an advanced flat resolution algorithm based on:
- Gradient away from higher terrain
- Gradient towards lower terrain
- Combination of both gradients to create realistic flow patterns

### Memory Efficiency

The tiled approach allows processing of very large datasets with minimal memory requirements:
- Each tile is processed independently
- Only tile edges are kept in memory for cross-tile connectivity
- Efficient data structures minimize memory overhead

## Installation

### Recommended Installation

The recommended approach is to use conda/mamba for system dependencies (GDAL, Numba, CUDA) and pip for installing Overflow:

```bash
# Create a new conda environment with required system dependencies
conda create -n overflow python=3.11 gdal=3.8.4 numba=0.59.0 numpy=1.26.4 -c conda-forge

# Activate the environment
conda activate overflow

# Install overflow from PyPI
pip install overflow-hydro
```

**With CUDA support (optional, for GPU acceleration):**

```bash
# Create environment with CUDA support
conda create -n overflow python=3.11 gdal=3.8.4 numba=0.59.0 numpy=1.26.4 \
    cuda-nvrtc=12.3.107 cuda-nvcc=12.3.107 -c conda-forge

conda activate overflow
pip install overflow-hydro
```

## Requirements

**System Dependencies:**
- GDAL >= 3.8
- CUDA Toolkit >= 12.3 (optional, for GPU acceleration)

**Python Dependencies (automatically installed via pip):**
- Python >= 3.11
- NumPy >= 1.26
- Numba >= 0.59
- Click >= 8.0
- Rich >= 13.0
- Shapely >= 2.0
- psutil >= 6.0
- tqdm >= 4.62

## Performance Considerations

- Choose chunk sizes based on available RAM and dataset size
- Larger chunk sizes generally provide better performance but require more memory
- In some cases, the tiled approach may be slower than in-memory processing for small datasets but enables processing of much larger ones

## Output Formats

- All raster outputs are in GeoTIFF format
- Stream networks are saved as GeoPackage files containing both vector lines and junction points
- Watershed boundaries are saved as both raster (GeoTIFF) and vector (GeoPackage) formats


## Basic Usage

## Command Line Interface

Overflow provides a comprehensive command line interface for processing DEMs and performing hydrological analysis:

### Full DEM Processing Pipeline

```bash
python overflow_cli.py process-dem \
    --dem_file input.tif \
    --output_dir results \
    --chunk_size 2000 \
    --search_radius_ft 200 \
    --da_sqmi 1 \
    --basins \
    --fill_holes
```

### Individual Operations

#### Breach Single Cell Pits
```bash
python overflow_cli.py breach-single-cell-pits \
    --input_file dem.tif \
    --output_file breached.tif \
    --chunk_size 2000
```

#### Breach Paths (Least Cost)
```bash
python overflow_cli.py breach-paths-least-cost \
    --input_file dem.tif \
    --output_file breached.tif \
    --chunk_size 2000 \
    --search_radius 200 \
    --max_cost 100

# With CUDA acceleration
python overflow_cli.py breach-paths-least-cost \
    --input_file dem.tif \
    --output_file breached.tif \
    --cuda \
    --max_pits 10000
```

#### Fill Depressions
```bash
python overflow_cli.py fill-depressions \
    --dem_file dem.tif \
    --output_file filled.tif \
    --chunk_size 2000 \
    --working_dir temp \
    --fill_holes
```

#### Calculate Flow Direction
```bash
python overflow_cli.py flow-direction \
    --input_file dem.tif \
    --output_file flowdir.tif \
    --chunk_size 2000
```

#### Fix Flats in Flow Direction
```bash
python overflow_cli.py fix-flats \
    --dem_file dem.tif \
    --fdr_file flowdir.tif \
    --output_file flowdir_fixed.tif \
    --chunk_size 2000 \
    --working_dir temp
```

#### Calculate Flow Accumulation
```bash
python overflow_cli.py flow-accumulation \
    --fdr_file flowdir.tif \
    --output_file flowacc.tif \
    --chunk_size 2000
```

#### Extract Stream Network
```bash
python overflow_cli.py extract-streams \
    --fac_file flowacc.tif \
    --fdr_file flowdir.tif \
    --output_dir streams \
    --cell_count_threshold 5 \
    --chunk_size 2000
```

#### Delineate Watersheds
```bash
python overflow_cli.py label-watersheds \
    --fdr_file flowdir.tif \
    --dp_file points.gpkg \
    --output_file basins.tif \
    --chunk_size 2000 \
    --all_basins
```

### Key Parameters

- `chunk_size`: Controls tile size for processing. Larger values use more memory but may be faster. Default is 2000.
- `search_radius`: Distance to search for breach paths (in cells).
- `search_radius_ft`: Distance to search for breach paths (in feet, automatically converted to cells).
- `da_sqmi`: Minimum drainage area in square miles for stream extraction.
- `cell_count_threshold`: Minimum number of cells draining to a point to be considered a stream.
- `max_cost`: Maximum elevation that can be removed when breaching paths.
- `working_dir`: Directory for temporary files during processing.
- `fill_holes`: Flag to fill no-data holes in the DEM.
- `all_basins`: Flag to delineate all watersheds, not just those upstream of drainage points.

### Notes

- All operations support both in-memory (chunk_size ≤ 0) and tiled processing modes
- For large datasets, use tiled processing with an appropriate chunk_size
- GPU acceleration available for breach path calculations with `--cuda` flag
- Most operations output GeoTIFF format except streams/watersheds which also output GeoPackage vector files

## Python API

### Individual Operations

#### DEM Pit Processing

```python
from overflow import breach_single_cell_pits, breach_paths_least_cost, breach_paths_least_cost_cuda

# Breach single cell pits
breach_single_cell_pits(
    input_path="dem.tif",
    output_path="breached_pits.tif",
    chunk_size=2000
)

# Breach paths using least cost algorithm (CPU)
breach_paths_least_cost(
    input_path="dem.tif",
    output_path="breached_paths.tif",
    chunk_size=2000,
    search_radius=200,  # cells to search for breach path
    max_cost=100       # maximum elevation that can be removed
)

# Breach paths using CUDA acceleration
breach_paths_least_cost_cuda(
    input_path="dem.tif",
    output_path="breached_cuda.tif",
    chunk_size=2000,
    search_radius=200,
    max_pits=10000,    # maximum pits to process per chunk
    max_cost=100
)
```

#### Depression Filling

```python
from overflow import fill_depressions, fill_depressions_tiled

# In-memory depression filling
fill_depressions(
    dem_file="dem.tif",
    output_file="filled.tif",
    fill_holes=True    # fill no-data holes in DEM
)

# Tiled depression filling for large DEMs
fill_depressions_tiled(
    dem_file="dem.tif",
    output_file="filled.tif",
    chunk_size=2000,
    working_dir="temp",
    fill_holes=True
)
```

#### Flow Direction and Flat Resolution

```python
from overflow import flow_direction
from overflow.fix_flats.core import fix_flats_from_file
from overflow.fix_flats.tiled import fix_flats_tiled

# Calculate flow direction
flow_direction(
    input_path="dem.tif",
    output_path="flowdir.tif",
    chunk_size=2000
)

# Fix flats in-memory
fix_flats_from_file(
    dem_file="dem.tif",
    fdr_file="flowdir.tif",
    output_file="flowdir_fixed.tif"
)

# Fix flats using tiled approach
fix_flats_tiled(
    dem_file="dem.tif",
    fdr_file="flowdir.tif",
    output_file="flowdir_fixed.tif",
    chunk_size=2000,
    working_dir="temp"
)
```

#### Flow Accumulation

```python
from overflow import flow_accumulation, flow_accumulation_tiled

# Calculate flow accumulation in-memory
flow_accumulation(
    fdr_path="flowdir.tif",
    output_path="flowacc.tif"
)

# Calculate flow accumulation using tiled approach
flow_accumulation_tiled(
    fdr_file="flowdir.tif",
    output_file="flowacc.tif",
    chunk_size=2000
)
```

#### Stream Network Extraction

```python
from overflow import extract_streams, extract_streams_tiled

# Extract streams in-memory
extract_streams(
    fac_path="flowacc.tif",
    fdr_path="flowdir.tif",
    output_dir="results",
    cell_count_threshold=1000  # minimum drainage area in cells
)

# Extract streams using tiled approach
extract_streams_tiled(
    fac_file="flowacc.tif",
    fdr_file="flowdir.tif",
    output_dir="results",
    cell_count_threshold=1000,
    chunk_size=2000
)
```

#### Watershed Delineation

```python
from overflow.basins.core import label_watersheds_from_file, drainage_points_from_file
from overflow.basins.tiled import label_watersheds_tiled

# Get drainage points from vector file
drainage_points = drainage_points_from_file(
    fdr_filepath="flowdir.tif",
    drainage_points_file="points.gpkg",
    layer_name=None    # use first layer if None
)

# Delineate watersheds in-memory
label_watersheds_from_file(
    fdr_filepath="flowdir.tif",
    drainage_points_file="points.gpkg",
    output_file="basins.tif",
    all_basins=True,   # delineate all basins vs only those upstream of points
    dp_layer=None
)

# Delineate watersheds using tiled approach
label_watersheds_tiled(
    fdr_filepath="flowdir.tif",
    drainage_points=drainage_points,
    output_file="basins.tif",
    chunk_size=2000,
    all_basins=True
)
```

### Processing Modes

- All major operations support both in-memory and tiled processing
- Tiled mode: Use functions with `_tiled` suffix and specify `chunk_size > 0`
- CUDA acceleration available only for breach paths calculation

### Memory Considerations

- In-memory mode loads entire dataset into RAM
- Tiled mode processes data in chunks, using less memory
- Larger chunk sizes generally improve performance but require more memory
- Working directory required for temporary files in tiled mode
- CUDA implementation requires additional GPU memory

### Output Formats

- Most operations output GeoTIFF rasters
- Stream network extraction produces:
  - `streams.tif`: Raster representation of streams
  - `streams.gpkg`: Vector representation with streams and junction points
- Watershed delineation produces:
  - `basins.tif`: Raster representation of watersheds
  - `basins.gpkg`: Vector representation of watershed boundaries

### Unit Conversion Utilities

```python
from overflow.util.raster import sqmi_to_cell_count, feet_to_cell_count

# Convert feet to cell count based on DEM resolution
cells = feet_to_cell_count(200, "dem.tif")  # 200ft to cells

# Convert square miles to cell count
cells = sqmi_to_cell_count(1, "dem.tif")    # 1 sq mile to cells
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "overflow-hydro",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "hydrology, terrain-analysis, dem, digital-elevation-model, flow-accumulation, watershed, geospatial, gis, parallel-processing, numba",
    "author": "Overflow Contributors",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/88/46/e3b26d1a68223118dcc4005549c91330973b2a56f99b236b0c151a9006e5/overflow_hydro-0.1.4.tar.gz",
    "platform": null,
    "description": "# Overflow\n\nOverflow is a high-performance Python library for hydrological terrain analysis that specializes in processing massive Digital Elevation Models (DEMs) through parallel, tiled algorithms. Unlike traditional GIS tools, Overflow is built from the ground up for large-scale data processing.\n\n## Why Overflow?\n\n### Performance at Scale\n- **Parallel Processing**: Every algorithm is designed for parallel execution using Numba, with additional CUDA acceleration for supported operations\n- **Memory-Efficient Tiling**: Process DEMs larger than RAM through sophisticated tiled algorithms that maintain accuracy across tile boundaries\n- **Flexible Processing Modes**: Choose between in-memory processing for speed on smaller datasets or tiled processing for massive datasets\n\n### Key Technical Advantages\n- **Larger Size Limits**: Unlike existing open source hydrology tools like pysheds or proprietary ArcGIS tools, Overflow can process DEMs of excessive size with a much smaller memory footprint through its tiled algorithms\n- **True Parallelism**: Most GRASS GIS tools, while memory efficient, are single-threaded. Overflow achieves true parallel processing through Numba\n- **Programmable First**: Built as a proper Python library with both high-level and low-level APIs, not just a collection of command-line tools\n- **Modern Algorithms**: Implements state-of-the-art approaches like:\n  - Priority-flood depression filling\n  - Least-cost path breaching\n  - Graph-based flat resolution that maintains drainage patterns\n  - Parallel flow accumulation that correctly handles tile boundaries\n\n### When to Use Overflow\n\nChoose Overflow when you need to:\n- Process very large DEMs (10,000+ pixels in any dimension)\n- Integrate hydrological processing into automated pipelines\n- Leverage multiple CPU cores or GPU acceleration\n- Handle datasets too large for traditional GIS tools\n- Maintain programmatic control over the processing pipeline\n\n### When Other Tools Might Be Better\n\nStick with traditional tools when:\n- Working with small DEMs interactively\n- Needing a GUI interface\n- Requiring extensive visualization capabilities\n- Processing speed isn't critical\n\n## Example Use Cases\n\n- **Large-Scale Hydrology**: Process high resolution, continental-scale, DEMs for flood modeling or watershed analysis\n- **Automated Processing**: Integrate into data pipelines for batch processing multiple DEMs\n- **High-Performance Computing**: Leverage parallel processing for time-critical applications\n- **Memory-Constrained Environments**: Process massive datasets on machines with limited RAM\n\nOverflow provides a comprehensive, scalable solution for extracting hydrological features from Digital Elevation Models. The entire pipeline, from initial DEM preprocessing through to stream network and watershed extraction, is designed to handle massive datasets while maintaining accuracy across tile boundaries. The result is a complete toolkit that takes you from raw DEM to finished hydrological products without size limitations or performance bottlenecks. \n\n## Key Features\n\n### Core Tools\n\n- **DEM Breaching**: Implements least-cost-path based breach path algorithm to eliminate depressions and create a hydrologically correct DEM.\n- **DEM Depression Filling**: An implementation of (https://arxiv.org/abs/1606.06204). Fills depressions in DEMs using a parallel priority-flood algorithm while preserving natural drainage patterns.\n- **Flow Direction**: Calculates D8 flow direction using parallel processing.\n- **Flow Direction Flat Resolution**: An implementation of (https://www.sciencedirect.com/science/article/abs/pii/S0098300421002971). Resolves flow directions in flat areas using gradient away from higher terrain and towards lower terrain.\n- **Flow Accumulation**: An implementation of (https://www.sciencedirect.com/science/article/abs/pii/S1364815216304984). Computes flow accumulation using a parallel, tiled approach that correctly handles flow across tile boundaries.\n- **Stream Network Extraction**: Delineates stream networks based on flow accumulation thresholds with proper handling of stream connectivity across tiles.\n- **Basin Delineation**: Performs watershed delineation using a parallel approach that maintains basin connectivity across tile boundaries.\n\n### Tiled Processing\n\nAll algorithms in Overflow are designed to process DEMs in tiles, enabling the handling of datasets larger than available RAM. The algorithms maintain correctness across tile boundaries through sophisticated edge handling and graph-based approaches.\n\n### Parallel Processing\n\nOverflow utilizes parallel processing at multiple levels:\n- Tile-level parallelism where multiple tiles are processed concurrently\n- Within-tile parallelism using Numba for CPU acceleration\n- Optional CUDA implementation for breach path calculation on GPUs\n\n### Depression Handling\n\nOverflow provides two approaches for handling depressions in DEMs:\n1. **Breaching**: Uses a least-cost path algorithm to create drainage paths through barriers\n2. **Filling**: Implements a parallel priority-flood algorithm to fill depressions while preserving natural drainage patterns\n\n### Flow Direction in Flat Areas\n\nThe library implements an advanced flat resolution algorithm based on:\n- Gradient away from higher terrain\n- Gradient towards lower terrain\n- Combination of both gradients to create realistic flow patterns\n\n### Memory Efficiency\n\nThe tiled approach allows processing of very large datasets with minimal memory requirements:\n- Each tile is processed independently\n- Only tile edges are kept in memory for cross-tile connectivity\n- Efficient data structures minimize memory overhead\n\n## Installation\n\n### Recommended Installation\n\nThe recommended approach is to use conda/mamba for system dependencies (GDAL, Numba, CUDA) and pip for installing Overflow:\n\n```bash\n# Create a new conda environment with required system dependencies\nconda create -n overflow python=3.11 gdal=3.8.4 numba=0.59.0 numpy=1.26.4 -c conda-forge\n\n# Activate the environment\nconda activate overflow\n\n# Install overflow from PyPI\npip install overflow-hydro\n```\n\n**With CUDA support (optional, for GPU acceleration):**\n\n```bash\n# Create environment with CUDA support\nconda create -n overflow python=3.11 gdal=3.8.4 numba=0.59.0 numpy=1.26.4 \\\n    cuda-nvrtc=12.3.107 cuda-nvcc=12.3.107 -c conda-forge\n\nconda activate overflow\npip install overflow-hydro\n```\n\n## Requirements\n\n**System Dependencies:**\n- GDAL >= 3.8\n- CUDA Toolkit >= 12.3 (optional, for GPU acceleration)\n\n**Python Dependencies (automatically installed via pip):**\n- Python >= 3.11\n- NumPy >= 1.26\n- Numba >= 0.59\n- Click >= 8.0\n- Rich >= 13.0\n- Shapely >= 2.0\n- psutil >= 6.0\n- tqdm >= 4.62\n\n## Performance Considerations\n\n- Choose chunk sizes based on available RAM and dataset size\n- Larger chunk sizes generally provide better performance but require more memory\n- In some cases, the tiled approach may be slower than in-memory processing for small datasets but enables processing of much larger ones\n\n## Output Formats\n\n- All raster outputs are in GeoTIFF format\n- Stream networks are saved as GeoPackage files containing both vector lines and junction points\n- Watershed boundaries are saved as both raster (GeoTIFF) and vector (GeoPackage) formats\n\n\n## Basic Usage\n\n## Command Line Interface\n\nOverflow provides a comprehensive command line interface for processing DEMs and performing hydrological analysis:\n\n### Full DEM Processing Pipeline\n\n```bash\npython overflow_cli.py process-dem \\\n    --dem_file input.tif \\\n    --output_dir results \\\n    --chunk_size 2000 \\\n    --search_radius_ft 200 \\\n    --da_sqmi 1 \\\n    --basins \\\n    --fill_holes\n```\n\n### Individual Operations\n\n#### Breach Single Cell Pits\n```bash\npython overflow_cli.py breach-single-cell-pits \\\n    --input_file dem.tif \\\n    --output_file breached.tif \\\n    --chunk_size 2000\n```\n\n#### Breach Paths (Least Cost)\n```bash\npython overflow_cli.py breach-paths-least-cost \\\n    --input_file dem.tif \\\n    --output_file breached.tif \\\n    --chunk_size 2000 \\\n    --search_radius 200 \\\n    --max_cost 100\n\n# With CUDA acceleration\npython overflow_cli.py breach-paths-least-cost \\\n    --input_file dem.tif \\\n    --output_file breached.tif \\\n    --cuda \\\n    --max_pits 10000\n```\n\n#### Fill Depressions\n```bash\npython overflow_cli.py fill-depressions \\\n    --dem_file dem.tif \\\n    --output_file filled.tif \\\n    --chunk_size 2000 \\\n    --working_dir temp \\\n    --fill_holes\n```\n\n#### Calculate Flow Direction\n```bash\npython overflow_cli.py flow-direction \\\n    --input_file dem.tif \\\n    --output_file flowdir.tif \\\n    --chunk_size 2000\n```\n\n#### Fix Flats in Flow Direction\n```bash\npython overflow_cli.py fix-flats \\\n    --dem_file dem.tif \\\n    --fdr_file flowdir.tif \\\n    --output_file flowdir_fixed.tif \\\n    --chunk_size 2000 \\\n    --working_dir temp\n```\n\n#### Calculate Flow Accumulation\n```bash\npython overflow_cli.py flow-accumulation \\\n    --fdr_file flowdir.tif \\\n    --output_file flowacc.tif \\\n    --chunk_size 2000\n```\n\n#### Extract Stream Network\n```bash\npython overflow_cli.py extract-streams \\\n    --fac_file flowacc.tif \\\n    --fdr_file flowdir.tif \\\n    --output_dir streams \\\n    --cell_count_threshold 5 \\\n    --chunk_size 2000\n```\n\n#### Delineate Watersheds\n```bash\npython overflow_cli.py label-watersheds \\\n    --fdr_file flowdir.tif \\\n    --dp_file points.gpkg \\\n    --output_file basins.tif \\\n    --chunk_size 2000 \\\n    --all_basins\n```\n\n### Key Parameters\n\n- `chunk_size`: Controls tile size for processing. Larger values use more memory but may be faster. Default is 2000.\n- `search_radius`: Distance to search for breach paths (in cells).\n- `search_radius_ft`: Distance to search for breach paths (in feet, automatically converted to cells).\n- `da_sqmi`: Minimum drainage area in square miles for stream extraction.\n- `cell_count_threshold`: Minimum number of cells draining to a point to be considered a stream.\n- `max_cost`: Maximum elevation that can be removed when breaching paths.\n- `working_dir`: Directory for temporary files during processing.\n- `fill_holes`: Flag to fill no-data holes in the DEM.\n- `all_basins`: Flag to delineate all watersheds, not just those upstream of drainage points.\n\n### Notes\n\n- All operations support both in-memory (chunk_size \u2264 0) and tiled processing modes\n- For large datasets, use tiled processing with an appropriate chunk_size\n- GPU acceleration available for breach path calculations with `--cuda` flag\n- Most operations output GeoTIFF format except streams/watersheds which also output GeoPackage vector files\n\n## Python API\n\n### Individual Operations\n\n#### DEM Pit Processing\n\n```python\nfrom overflow import breach_single_cell_pits, breach_paths_least_cost, breach_paths_least_cost_cuda\n\n# Breach single cell pits\nbreach_single_cell_pits(\n    input_path=\"dem.tif\",\n    output_path=\"breached_pits.tif\",\n    chunk_size=2000\n)\n\n# Breach paths using least cost algorithm (CPU)\nbreach_paths_least_cost(\n    input_path=\"dem.tif\",\n    output_path=\"breached_paths.tif\",\n    chunk_size=2000,\n    search_radius=200,  # cells to search for breach path\n    max_cost=100       # maximum elevation that can be removed\n)\n\n# Breach paths using CUDA acceleration\nbreach_paths_least_cost_cuda(\n    input_path=\"dem.tif\",\n    output_path=\"breached_cuda.tif\",\n    chunk_size=2000,\n    search_radius=200,\n    max_pits=10000,    # maximum pits to process per chunk\n    max_cost=100\n)\n```\n\n#### Depression Filling\n\n```python\nfrom overflow import fill_depressions, fill_depressions_tiled\n\n# In-memory depression filling\nfill_depressions(\n    dem_file=\"dem.tif\",\n    output_file=\"filled.tif\",\n    fill_holes=True    # fill no-data holes in DEM\n)\n\n# Tiled depression filling for large DEMs\nfill_depressions_tiled(\n    dem_file=\"dem.tif\",\n    output_file=\"filled.tif\",\n    chunk_size=2000,\n    working_dir=\"temp\",\n    fill_holes=True\n)\n```\n\n#### Flow Direction and Flat Resolution\n\n```python\nfrom overflow import flow_direction\nfrom overflow.fix_flats.core import fix_flats_from_file\nfrom overflow.fix_flats.tiled import fix_flats_tiled\n\n# Calculate flow direction\nflow_direction(\n    input_path=\"dem.tif\",\n    output_path=\"flowdir.tif\",\n    chunk_size=2000\n)\n\n# Fix flats in-memory\nfix_flats_from_file(\n    dem_file=\"dem.tif\",\n    fdr_file=\"flowdir.tif\",\n    output_file=\"flowdir_fixed.tif\"\n)\n\n# Fix flats using tiled approach\nfix_flats_tiled(\n    dem_file=\"dem.tif\",\n    fdr_file=\"flowdir.tif\",\n    output_file=\"flowdir_fixed.tif\",\n    chunk_size=2000,\n    working_dir=\"temp\"\n)\n```\n\n#### Flow Accumulation\n\n```python\nfrom overflow import flow_accumulation, flow_accumulation_tiled\n\n# Calculate flow accumulation in-memory\nflow_accumulation(\n    fdr_path=\"flowdir.tif\",\n    output_path=\"flowacc.tif\"\n)\n\n# Calculate flow accumulation using tiled approach\nflow_accumulation_tiled(\n    fdr_file=\"flowdir.tif\",\n    output_file=\"flowacc.tif\",\n    chunk_size=2000\n)\n```\n\n#### Stream Network Extraction\n\n```python\nfrom overflow import extract_streams, extract_streams_tiled\n\n# Extract streams in-memory\nextract_streams(\n    fac_path=\"flowacc.tif\",\n    fdr_path=\"flowdir.tif\",\n    output_dir=\"results\",\n    cell_count_threshold=1000  # minimum drainage area in cells\n)\n\n# Extract streams using tiled approach\nextract_streams_tiled(\n    fac_file=\"flowacc.tif\",\n    fdr_file=\"flowdir.tif\",\n    output_dir=\"results\",\n    cell_count_threshold=1000,\n    chunk_size=2000\n)\n```\n\n#### Watershed Delineation\n\n```python\nfrom overflow.basins.core import label_watersheds_from_file, drainage_points_from_file\nfrom overflow.basins.tiled import label_watersheds_tiled\n\n# Get drainage points from vector file\ndrainage_points = drainage_points_from_file(\n    fdr_filepath=\"flowdir.tif\",\n    drainage_points_file=\"points.gpkg\",\n    layer_name=None    # use first layer if None\n)\n\n# Delineate watersheds in-memory\nlabel_watersheds_from_file(\n    fdr_filepath=\"flowdir.tif\",\n    drainage_points_file=\"points.gpkg\",\n    output_file=\"basins.tif\",\n    all_basins=True,   # delineate all basins vs only those upstream of points\n    dp_layer=None\n)\n\n# Delineate watersheds using tiled approach\nlabel_watersheds_tiled(\n    fdr_filepath=\"flowdir.tif\",\n    drainage_points=drainage_points,\n    output_file=\"basins.tif\",\n    chunk_size=2000,\n    all_basins=True\n)\n```\n\n### Processing Modes\n\n- All major operations support both in-memory and tiled processing\n- Tiled mode: Use functions with `_tiled` suffix and specify `chunk_size > 0`\n- CUDA acceleration available only for breach paths calculation\n\n### Memory Considerations\n\n- In-memory mode loads entire dataset into RAM\n- Tiled mode processes data in chunks, using less memory\n- Larger chunk sizes generally improve performance but require more memory\n- Working directory required for temporary files in tiled mode\n- CUDA implementation requires additional GPU memory\n\n### Output Formats\n\n- Most operations output GeoTIFF rasters\n- Stream network extraction produces:\n  - `streams.tif`: Raster representation of streams\n  - `streams.gpkg`: Vector representation with streams and junction points\n- Watershed delineation produces:\n  - `basins.tif`: Raster representation of watersheds\n  - `basins.gpkg`: Vector representation of watershed boundaries\n\n### Unit Conversion Utilities\n\n```python\nfrom overflow.util.raster import sqmi_to_cell_count, feet_to_cell_count\n\n# Convert feet to cell count based on DEM resolution\ncells = feet_to_cell_count(200, \"dem.tif\")  # 200ft to cells\n\n# Convert square miles to cell count\ncells = sqmi_to_cell_count(1, \"dem.tif\")    # 1 sq mile to cells\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "High-performance Python library for hydrological terrain analysis with parallel, tiled algorithms",
    "version": "0.1.4",
    "project_urls": {
        "Documentation": "https://github.com/fema-ffrd/overflow#readme",
        "Homepage": "https://github.com/fema-ffrd/overflow",
        "Issues": "https://github.com/fema-ffrd/overflow/issues",
        "Repository": "https://github.com/fema-ffrd/overflow"
    },
    "split_keywords": [
        "hydrology",
        " terrain-analysis",
        " dem",
        " digital-elevation-model",
        " flow-accumulation",
        " watershed",
        " geospatial",
        " gis",
        " parallel-processing",
        " numba"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2c053ee1939643ba25e9c013ab5407a89dc69edf4693f98f0e8622c0e89fad5b",
                "md5": "5d5aea9781b87ca66d93f598e3b6f3e0",
                "sha256": "dacd31a1f622ed1f1fd374fc1cada5386899d04aec00fd7c0d3a816689d19531"
            },
            "downloads": -1,
            "filename": "overflow_hydro-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5d5aea9781b87ca66d93f598e3b6f3e0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 109136,
            "upload_time": "2025-10-17T18:37:44",
            "upload_time_iso_8601": "2025-10-17T18:37:44.124149Z",
            "url": "https://files.pythonhosted.org/packages/2c/05/3ee1939643ba25e9c013ab5407a89dc69edf4693f98f0e8622c0e89fad5b/overflow_hydro-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8846e3b26d1a68223118dcc4005549c91330973b2a56f99b236b0c151a9006e5",
                "md5": "27d86f848b72cd628633827dfd662b29",
                "sha256": "e71c9b560080b9b30a3e6d3f334dbdf45d0336ac95a268823a28f8c3dee059c7"
            },
            "downloads": -1,
            "filename": "overflow_hydro-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "27d86f848b72cd628633827dfd662b29",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 93363,
            "upload_time": "2025-10-17T18:37:45",
            "upload_time_iso_8601": "2025-10-17T18:37:45.429527Z",
            "url": "https://files.pythonhosted.org/packages/88/46/e3b26d1a68223118dcc4005549c91330973b2a56f99b236b0c151a9006e5/overflow_hydro-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-17 18:37:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "fema-ffrd",
    "github_project": "overflow#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "overflow-hydro"
}

Overflow Contributors