multiclean


Namemulticlean JSON
Version 0.1.2 PyPI version JSON
download
home_pageNone
SummaryPython library for morphological cleaning of multiclass 2D numpy arrays (edge smoothing and island removal)
upload_time2025-09-02 07:59:25
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseNone
keywords morphological-operations multiclass classification-arrays image-processing edge-cleaning island-removal segmentation post-processing numpy opencv remote-sensing computer-vision noise-reduction
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # MultiClean
[![image](https://img.shields.io/pypi/v/multiclean.svg)](https://pypi.python.org/pypi/multiclean)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Tutorials](https://img.shields.io/badge/Tutorials-Learn-brightgreen)](https://github.com/DPIRD-DMA/MultiClean/tree/main/notebooks)

**MultiClean** is a Python library for morphological cleaning of multiclass 2D numpy arrays (segmentation masks and classification rasters). It provides efficient tools for edge smoothing and small-island removal across multiple classes, then fills gaps using the nearest valid class.

## Visual Example

Below: Land Use before/after cleaning (smoothed edges, small-island removal, nearest-class gap fill).

<img src="https://raw.githubusercontent.com/DPIRD-DMA/MultiClean/main/assets/land_use_before_after.png" alt="Land Use before/after" width="900"/>

## Key Features

- **Multi-class processing**: Clean all classes in one pass
- **Edge smoothing**: Morphological opening to reduce jagged boundaries
- **Island removal**: Remove small connected components per class
- **Gap filling**: Fill invalids via nearest valid class (distance transform)
- **Fast**: NumPy + OpenCV + SciPy with parallelism

## Installation

```bash
pip install multiclean
```
or
```bash
uv add multiclean
```


## Quick Start

```python
import numpy as np
from multiclean import clean_array

# Create a sample classification array with classes 0, 1, 2, 3
array = np.random.randint(0, 4, (1000, 1000), dtype=np.int32)

# Clean with default parameters
cleaned = clean_array(array)

# Custom parameters
cleaned = clean_array(
    array,
    class_values=[0, 1, 2, 3],
    smooth_edge_size=2,     # kernel width, larger value increases smoothness
    min_island_size=100,    # remove components with area < 100
    connectivity=8,         # 4 or 8
    max_workers=4,
)
```
## Examples

See the notebooks folder for end-to-end examples:
- [Land Use Example Notebook](https://github.com/DPIRD-DMA/MultiClean/blob/main/notebooks/Land%20use%20example.ipynb): land use classification cleaning
- [Cloud Example Notebook](https://github.com/DPIRD-DMA/MultiClean/blob/main/notebooks/Cloud%20example.ipynb): cloud/shadow classification cleaning

## Try in Colab

[![Colab_Button]][Link]

[Link]: https://colab.research.google.com/github/DPIRD-DMA/MultiClean/blob/main/notebooks/Land%20use%20example%20(Colab).ipynb 'Try MultiClean In Colab'

[Colab_Button]: https://img.shields.io/badge/Try%20in%20Colab-grey?style=for-the-badge&logo=google-colab

## Use Cases

MultiClean is designed for cleaning segmentation outputs from:

- **Remote sensing**: Land cover classification, crop mapping
- **Computer vision**: Semantic segmentation post-processing  
- **Geospatial analysis**: Raster classification cleaning
- **Machine learning**: Neural network output refinement

## How It Works

MultiClean uses morphological operations to clean classification arrays:

1. **Edge smoothing (per class)**: Morphological opening with an elliptical kernel.
2. **Island removal (per class)**: Find connected components (OpenCV) and mark components with area `< min_island_size` as invalid.
3. **Gap filling**: Compute a distance transform to copy the nearest valid class into invalid pixels.

Classes are processed together and the result maintains a valid label at every pixel.

## API

### `clean_array`

```python
from multiclean import clean_array

out = clean_array(
    array: np.ndarray,
    class_values: int | list[int] | None = None,
    smooth_edge_size: int = 2,
    min_island_size: int = 100,
    connectivity: int = 4,
    max_workers: int | None = None,
)
```

- `array`: 2D numpy array of class labels (int or float). For float arrays, `NaN` is treated as nodata and will remain `NaN`.
- `class_values`: Classes to consider. If `None`, inferred from `array` (ignores `NaN` for floats). An int restricts cleaning to a single class.
- `smooth_edge_size`: Kernel size (pixels) for morphological opening. Use `0` to disable.
- `min_island_size`: Remove components with area strictly `< min_island_size`. Use `1` to keep single pixels.
- `connectivity`: Pixel connectivity for components, `4` or `8`.
- `max_workers`: Parallelism for per-class operations (None lets the executor choose).

Returns a numpy array matching the input shape. Integer inputs return integer outputs. Float arrays with `NaN` are supported (treated as nodata and retained as `NaN`).


## Performance

MultiClean is optimised for large arrays:

- **Vectorised operations** using NumPy, OpenCV, and SciPy
- **Parallel processing** for island detection across classes
- **Fast distance transforms** for gap filling

## Examples

### Cleaning Satellite Land Cover Data

```python
from multiclean import clean_array
import rasterio

# Read land cover classification
with rasterio.open('landcover.tif') as src:
    landcover = src.read(1)

# Clean with appropriate parameters for satellite data
cleaned = clean_array(
    landcover,
    class_values=[0, 1, 2, 3, 4],  # forest, water, urban, crop, other
    smooth_edge_size=1,
    min_island_size=25,
    connectivity=8,
)
```

### Cleaning Neural Network Segmentation Output

```python
from multiclean import clean_array

# Model produces logits; convert to class predictions
np_pred = np_model_logits.argmax(axis=0)  # shape: (H, W)

# Clean the segmentation
cleaned = clean_array(
    np_pred,
    smooth_edge_size=2,
    min_island_size=100,
    connectivity=4,
)
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "multiclean",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "morphological-operations, multiclass, classification-arrays, image-processing, edge-cleaning, island-removal, segmentation, post-processing, numpy, opencv, remote-sensing, computer-vision, noise-reduction",
    "author": null,
    "author_email": "Nick Wright <nicholas.wright@dpird.wa.gov.au>",
    "download_url": "https://files.pythonhosted.org/packages/46/8a/441b83f9bd6de6d235c666bf1345b3f5f1ce54667b08e4a1f821db1a480b/multiclean-0.1.2.tar.gz",
    "platform": null,
    "description": "# MultiClean\n[![image](https://img.shields.io/pypi/v/multiclean.svg)](https://pypi.python.org/pypi/multiclean)\n[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Tutorials](https://img.shields.io/badge/Tutorials-Learn-brightgreen)](https://github.com/DPIRD-DMA/MultiClean/tree/main/notebooks)\n\n**MultiClean** is a Python library for morphological cleaning of multiclass 2D numpy arrays (segmentation masks and classification rasters). It provides efficient tools for edge smoothing and small-island removal across multiple classes, then fills gaps using the nearest valid class.\n\n## Visual Example\n\nBelow: Land Use before/after cleaning (smoothed edges, small-island removal, nearest-class gap fill).\n\n<img src=\"https://raw.githubusercontent.com/DPIRD-DMA/MultiClean/main/assets/land_use_before_after.png\" alt=\"Land Use before/after\" width=\"900\"/>\n\n## Key Features\n\n- **Multi-class processing**: Clean all classes in one pass\n- **Edge smoothing**: Morphological opening to reduce jagged boundaries\n- **Island removal**: Remove small connected components per class\n- **Gap filling**: Fill invalids via nearest valid class (distance transform)\n- **Fast**: NumPy + OpenCV + SciPy with parallelism\n\n## Installation\n\n```bash\npip install multiclean\n```\nor\n```bash\nuv add multiclean\n```\n\n\n## Quick Start\n\n```python\nimport numpy as np\nfrom multiclean import clean_array\n\n# Create a sample classification array with classes 0, 1, 2, 3\narray = np.random.randint(0, 4, (1000, 1000), dtype=np.int32)\n\n# Clean with default parameters\ncleaned = clean_array(array)\n\n# Custom parameters\ncleaned = clean_array(\n    array,\n    class_values=[0, 1, 2, 3],\n    smooth_edge_size=2,     # kernel width, larger value increases smoothness\n    min_island_size=100,    # remove components with area < 100\n    connectivity=8,         # 4 or 8\n    max_workers=4,\n)\n```\n## Examples\n\nSee the notebooks folder for end-to-end examples:\n- [Land Use Example Notebook](https://github.com/DPIRD-DMA/MultiClean/blob/main/notebooks/Land%20use%20example.ipynb): land use classification cleaning\n- [Cloud Example Notebook](https://github.com/DPIRD-DMA/MultiClean/blob/main/notebooks/Cloud%20example.ipynb): cloud/shadow classification cleaning\n\n## Try in Colab\n\n[![Colab_Button]][Link]\n\n[Link]: https://colab.research.google.com/github/DPIRD-DMA/MultiClean/blob/main/notebooks/Land%20use%20example%20(Colab).ipynb 'Try MultiClean In Colab'\n\n[Colab_Button]: https://img.shields.io/badge/Try%20in%20Colab-grey?style=for-the-badge&logo=google-colab\n\n## Use Cases\n\nMultiClean is designed for cleaning segmentation outputs from:\n\n- **Remote sensing**: Land cover classification, crop mapping\n- **Computer vision**: Semantic segmentation post-processing  \n- **Geospatial analysis**: Raster classification cleaning\n- **Machine learning**: Neural network output refinement\n\n## How It Works\n\nMultiClean uses morphological operations to clean classification arrays:\n\n1. **Edge smoothing (per class)**: Morphological opening with an elliptical kernel.\n2. **Island removal (per class)**: Find connected components (OpenCV) and mark components with area `< min_island_size` as invalid.\n3. **Gap filling**: Compute a distance transform to copy the nearest valid class into invalid pixels.\n\nClasses are processed together and the result maintains a valid label at every pixel.\n\n## API\n\n### `clean_array`\n\n```python\nfrom multiclean import clean_array\n\nout = clean_array(\n    array: np.ndarray,\n    class_values: int | list[int] | None = None,\n    smooth_edge_size: int = 2,\n    min_island_size: int = 100,\n    connectivity: int = 4,\n    max_workers: int | None = None,\n)\n```\n\n- `array`: 2D numpy array of class labels (int or float). For float arrays, `NaN` is treated as nodata and will remain `NaN`.\n- `class_values`: Classes to consider. If `None`, inferred from `array` (ignores `NaN` for floats). An int restricts cleaning to a single class.\n- `smooth_edge_size`: Kernel size (pixels) for morphological opening. Use `0` to disable.\n- `min_island_size`: Remove components with area strictly `< min_island_size`. Use `1` to keep single pixels.\n- `connectivity`: Pixel connectivity for components, `4` or `8`.\n- `max_workers`: Parallelism for per-class operations (None lets the executor choose).\n\nReturns a numpy array matching the input shape. Integer inputs return integer outputs. Float arrays with `NaN` are supported (treated as nodata and retained as `NaN`).\n\n\n## Performance\n\nMultiClean is optimised for large arrays:\n\n- **Vectorised operations** using NumPy, OpenCV, and SciPy\n- **Parallel processing** for island detection across classes\n- **Fast distance transforms** for gap filling\n\n## Examples\n\n### Cleaning Satellite Land Cover Data\n\n```python\nfrom multiclean import clean_array\nimport rasterio\n\n# Read land cover classification\nwith rasterio.open('landcover.tif') as src:\n    landcover = src.read(1)\n\n# Clean with appropriate parameters for satellite data\ncleaned = clean_array(\n    landcover,\n    class_values=[0, 1, 2, 3, 4],  # forest, water, urban, crop, other\n    smooth_edge_size=1,\n    min_island_size=25,\n    connectivity=8,\n)\n```\n\n### Cleaning Neural Network Segmentation Output\n\n```python\nfrom multiclean import clean_array\n\n# Model produces logits; convert to class predictions\nnp_pred = np_model_logits.argmax(axis=0)  # shape: (H, W)\n\n# Clean the segmentation\ncleaned = clean_array(\n    np_pred,\n    smooth_edge_size=2,\n    min_island_size=100,\n    connectivity=4,\n)\n```\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Python library for morphological cleaning of multiclass 2D numpy arrays (edge smoothing and island removal)",
    "version": "0.1.2",
    "project_urls": {
        "Homepage": "https://github.com/DPIRD-DMA/MultiClean"
    },
    "split_keywords": [
        "morphological-operations",
        " multiclass",
        " classification-arrays",
        " image-processing",
        " edge-cleaning",
        " island-removal",
        " segmentation",
        " post-processing",
        " numpy",
        " opencv",
        " remote-sensing",
        " computer-vision",
        " noise-reduction"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "47adc37739fd8def95531dda51fab169eb5779db6b1ceb503e74781a57ab44b0",
                "md5": "b4e16c3327413c916546d0b60ce47c51",
                "sha256": "edff1f69d1156c5808d09abc2749ab668d1f582a3cf3a59fe499cd9462a3625c"
            },
            "downloads": -1,
            "filename": "multiclean-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b4e16c3327413c916546d0b60ce47c51",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 9214,
            "upload_time": "2025-09-02T07:59:24",
            "upload_time_iso_8601": "2025-09-02T07:59:24.358123Z",
            "url": "https://files.pythonhosted.org/packages/47/ad/c37739fd8def95531dda51fab169eb5779db6b1ceb503e74781a57ab44b0/multiclean-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "468a441b83f9bd6de6d235c666bf1345b3f5f1ce54667b08e4a1f821db1a480b",
                "md5": "44e638a6346e2724c71309b4a4855c52",
                "sha256": "be900dc8791aa608a7ba9b5be5388659cecb4d3580056220cbb312980b40ae88"
            },
            "downloads": -1,
            "filename": "multiclean-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "44e638a6346e2724c71309b4a4855c52",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 11835,
            "upload_time": "2025-09-02T07:59:25",
            "upload_time_iso_8601": "2025-09-02T07:59:25.609299Z",
            "url": "https://files.pythonhosted.org/packages/46/8a/441b83f9bd6de6d235c666bf1345b3f5f1ce54667b08e4a1f821db1a480b/multiclean-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-02 07:59:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "DPIRD-DMA",
    "github_project": "MultiClean",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "multiclean"
}
        
Elapsed time: 1.23089s