# MultiClean
[](https://pypi.python.org/pypi/multiclean)
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
[](https://github.com/DPIRD-DMA/MultiClean/tree/main/notebooks)
**MultiClean** is a Python library for morphological cleaning of multiclass 2D numpy arrays (segmentation masks and classification rasters). It provides efficient tools for edge smoothing and small-island removal across multiple classes, then fills gaps using the nearest valid class.
## Visual Example
Below: Land Use before/after cleaning (smoothed edges, small-island removal, nearest-class gap fill).
<img src="https://raw.githubusercontent.com/DPIRD-DMA/MultiClean/main/assets/land_use_before_after.png" alt="Land Use before/after" width="900"/>
## Key Features
- **Multi-class processing**: Clean all classes in one pass
- **Edge smoothing**: Morphological opening to reduce jagged boundaries
- **Island removal**: Remove small connected components per class
- **Gap filling**: Fill invalids via nearest valid class (distance transform)
- **Fast**: NumPy + OpenCV + SciPy with parallelism
## Installation
```bash
pip install multiclean
```
or
```bash
uv add multiclean
```
## Quick Start
```python
import numpy as np
from multiclean import clean_array
# Create a sample classification array with classes 0, 1, 2, 3
array = np.random.randint(0, 4, (1000, 1000), dtype=np.int32)
# Clean with default parameters
cleaned = clean_array(array)
# Custom parameters
cleaned = clean_array(
array,
class_values=[0, 1, 2, 3],
smooth_edge_size=2, # kernel width, larger value increases smoothness
min_island_size=100, # remove components with area < 100
connectivity=8, # 4 or 8
max_workers=4,
)
```
## Examples
See the notebooks folder for end-to-end examples:
- [Land Use Example Notebook](https://github.com/DPIRD-DMA/MultiClean/blob/main/notebooks/Land%20use%20example.ipynb): land use classification cleaning
- [Cloud Example Notebook](https://github.com/DPIRD-DMA/MultiClean/blob/main/notebooks/Cloud%20example.ipynb): cloud/shadow classification cleaning
## Try in Colab
[![Colab_Button]][Link]
[Link]: https://colab.research.google.com/github/DPIRD-DMA/MultiClean/blob/main/notebooks/Land%20use%20example%20(Colab).ipynb 'Try MultiClean In Colab'
[Colab_Button]: https://img.shields.io/badge/Try%20in%20Colab-grey?style=for-the-badge&logo=google-colab
## Use Cases
MultiClean is designed for cleaning segmentation outputs from:
- **Remote sensing**: Land cover classification, crop mapping
- **Computer vision**: Semantic segmentation post-processing
- **Geospatial analysis**: Raster classification cleaning
- **Machine learning**: Neural network output refinement
## How It Works
MultiClean uses morphological operations to clean classification arrays:
1. **Edge smoothing (per class)**: Morphological opening with an elliptical kernel.
2. **Island removal (per class)**: Find connected components (OpenCV) and mark components with area `< min_island_size` as invalid.
3. **Gap filling**: Compute a distance transform to copy the nearest valid class into invalid pixels.
Classes are processed together and the result maintains a valid label at every pixel.
## API
### `clean_array`
```python
from multiclean import clean_array
out = clean_array(
array: np.ndarray,
class_values: int | list[int] | None = None,
smooth_edge_size: int = 2,
min_island_size: int = 100,
connectivity: int = 4,
max_workers: int | None = None,
)
```
- `array`: 2D numpy array of class labels (int or float). For float arrays, `NaN` is treated as nodata and will remain `NaN`.
- `class_values`: Classes to consider. If `None`, inferred from `array` (ignores `NaN` for floats). An int restricts cleaning to a single class.
- `smooth_edge_size`: Kernel size (pixels) for morphological opening. Use `0` to disable.
- `min_island_size`: Remove components with area strictly `< min_island_size`. Use `1` to keep single pixels.
- `connectivity`: Pixel connectivity for components, `4` or `8`.
- `max_workers`: Parallelism for per-class operations (None lets the executor choose).
Returns a numpy array matching the input shape. Integer inputs return integer outputs. Float arrays with `NaN` are supported (treated as nodata and retained as `NaN`).
## Performance
MultiClean is optimised for large arrays:
- **Vectorised operations** using NumPy, OpenCV, and SciPy
- **Parallel processing** for island detection across classes
- **Fast distance transforms** for gap filling
## Examples
### Cleaning Satellite Land Cover Data
```python
from multiclean import clean_array
import rasterio
# Read land cover classification
with rasterio.open('landcover.tif') as src:
landcover = src.read(1)
# Clean with appropriate parameters for satellite data
cleaned = clean_array(
landcover,
class_values=[0, 1, 2, 3, 4], # forest, water, urban, crop, other
smooth_edge_size=1,
min_island_size=25,
connectivity=8,
)
```
### Cleaning Neural Network Segmentation Output
```python
from multiclean import clean_array
# Model produces logits; convert to class predictions
np_pred = np_model_logits.argmax(axis=0) # shape: (H, W)
# Clean the segmentation
cleaned = clean_array(
np_pred,
smooth_edge_size=2,
min_island_size=100,
connectivity=4,
)
```
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
Raw data
{
"_id": null,
"home_page": null,
"name": "multiclean",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "morphological-operations, multiclass, classification-arrays, image-processing, edge-cleaning, island-removal, segmentation, post-processing, numpy, opencv, remote-sensing, computer-vision, noise-reduction",
"author": null,
"author_email": "Nick Wright <nicholas.wright@dpird.wa.gov.au>",
"download_url": "https://files.pythonhosted.org/packages/46/8a/441b83f9bd6de6d235c666bf1345b3f5f1ce54667b08e4a1f821db1a480b/multiclean-0.1.2.tar.gz",
"platform": null,
"description": "# MultiClean\n[](https://pypi.python.org/pypi/multiclean)\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n[](https://github.com/DPIRD-DMA/MultiClean/tree/main/notebooks)\n\n**MultiClean** is a Python library for morphological cleaning of multiclass 2D numpy arrays (segmentation masks and classification rasters). It provides efficient tools for edge smoothing and small-island removal across multiple classes, then fills gaps using the nearest valid class.\n\n## Visual Example\n\nBelow: Land Use before/after cleaning (smoothed edges, small-island removal, nearest-class gap fill).\n\n<img src=\"https://raw.githubusercontent.com/DPIRD-DMA/MultiClean/main/assets/land_use_before_after.png\" alt=\"Land Use before/after\" width=\"900\"/>\n\n## Key Features\n\n- **Multi-class processing**: Clean all classes in one pass\n- **Edge smoothing**: Morphological opening to reduce jagged boundaries\n- **Island removal**: Remove small connected components per class\n- **Gap filling**: Fill invalids via nearest valid class (distance transform)\n- **Fast**: NumPy + OpenCV + SciPy with parallelism\n\n## Installation\n\n```bash\npip install multiclean\n```\nor\n```bash\nuv add multiclean\n```\n\n\n## Quick Start\n\n```python\nimport numpy as np\nfrom multiclean import clean_array\n\n# Create a sample classification array with classes 0, 1, 2, 3\narray = np.random.randint(0, 4, (1000, 1000), dtype=np.int32)\n\n# Clean with default parameters\ncleaned = clean_array(array)\n\n# Custom parameters\ncleaned = clean_array(\n array,\n class_values=[0, 1, 2, 3],\n smooth_edge_size=2, # kernel width, larger value increases smoothness\n min_island_size=100, # remove components with area < 100\n connectivity=8, # 4 or 8\n max_workers=4,\n)\n```\n## Examples\n\nSee the notebooks folder for end-to-end examples:\n- [Land Use Example Notebook](https://github.com/DPIRD-DMA/MultiClean/blob/main/notebooks/Land%20use%20example.ipynb): land use classification cleaning\n- [Cloud Example Notebook](https://github.com/DPIRD-DMA/MultiClean/blob/main/notebooks/Cloud%20example.ipynb): cloud/shadow classification cleaning\n\n## Try in Colab\n\n[![Colab_Button]][Link]\n\n[Link]: https://colab.research.google.com/github/DPIRD-DMA/MultiClean/blob/main/notebooks/Land%20use%20example%20(Colab).ipynb 'Try MultiClean In Colab'\n\n[Colab_Button]: https://img.shields.io/badge/Try%20in%20Colab-grey?style=for-the-badge&logo=google-colab\n\n## Use Cases\n\nMultiClean is designed for cleaning segmentation outputs from:\n\n- **Remote sensing**: Land cover classification, crop mapping\n- **Computer vision**: Semantic segmentation post-processing \n- **Geospatial analysis**: Raster classification cleaning\n- **Machine learning**: Neural network output refinement\n\n## How It Works\n\nMultiClean uses morphological operations to clean classification arrays:\n\n1. **Edge smoothing (per class)**: Morphological opening with an elliptical kernel.\n2. **Island removal (per class)**: Find connected components (OpenCV) and mark components with area `< min_island_size` as invalid.\n3. **Gap filling**: Compute a distance transform to copy the nearest valid class into invalid pixels.\n\nClasses are processed together and the result maintains a valid label at every pixel.\n\n## API\n\n### `clean_array`\n\n```python\nfrom multiclean import clean_array\n\nout = clean_array(\n array: np.ndarray,\n class_values: int | list[int] | None = None,\n smooth_edge_size: int = 2,\n min_island_size: int = 100,\n connectivity: int = 4,\n max_workers: int | None = None,\n)\n```\n\n- `array`: 2D numpy array of class labels (int or float). For float arrays, `NaN` is treated as nodata and will remain `NaN`.\n- `class_values`: Classes to consider. If `None`, inferred from `array` (ignores `NaN` for floats). An int restricts cleaning to a single class.\n- `smooth_edge_size`: Kernel size (pixels) for morphological opening. Use `0` to disable.\n- `min_island_size`: Remove components with area strictly `< min_island_size`. Use `1` to keep single pixels.\n- `connectivity`: Pixel connectivity for components, `4` or `8`.\n- `max_workers`: Parallelism for per-class operations (None lets the executor choose).\n\nReturns a numpy array matching the input shape. Integer inputs return integer outputs. Float arrays with `NaN` are supported (treated as nodata and retained as `NaN`).\n\n\n## Performance\n\nMultiClean is optimised for large arrays:\n\n- **Vectorised operations** using NumPy, OpenCV, and SciPy\n- **Parallel processing** for island detection across classes\n- **Fast distance transforms** for gap filling\n\n## Examples\n\n### Cleaning Satellite Land Cover Data\n\n```python\nfrom multiclean import clean_array\nimport rasterio\n\n# Read land cover classification\nwith rasterio.open('landcover.tif') as src:\n landcover = src.read(1)\n\n# Clean with appropriate parameters for satellite data\ncleaned = clean_array(\n landcover,\n class_values=[0, 1, 2, 3, 4], # forest, water, urban, crop, other\n smooth_edge_size=1,\n min_island_size=25,\n connectivity=8,\n)\n```\n\n### Cleaning Neural Network Segmentation Output\n\n```python\nfrom multiclean import clean_array\n\n# Model produces logits; convert to class predictions\nnp_pred = np_model_logits.argmax(axis=0) # shape: (H, W)\n\n# Clean the segmentation\ncleaned = clean_array(\n np_pred,\n smooth_edge_size=2,\n min_island_size=100,\n connectivity=4,\n)\n```\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n",
"bugtrack_url": null,
"license": null,
"summary": "Python library for morphological cleaning of multiclass 2D numpy arrays (edge smoothing and island removal)",
"version": "0.1.2",
"project_urls": {
"Homepage": "https://github.com/DPIRD-DMA/MultiClean"
},
"split_keywords": [
"morphological-operations",
" multiclass",
" classification-arrays",
" image-processing",
" edge-cleaning",
" island-removal",
" segmentation",
" post-processing",
" numpy",
" opencv",
" remote-sensing",
" computer-vision",
" noise-reduction"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "47adc37739fd8def95531dda51fab169eb5779db6b1ceb503e74781a57ab44b0",
"md5": "b4e16c3327413c916546d0b60ce47c51",
"sha256": "edff1f69d1156c5808d09abc2749ab668d1f582a3cf3a59fe499cd9462a3625c"
},
"downloads": -1,
"filename": "multiclean-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b4e16c3327413c916546d0b60ce47c51",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 9214,
"upload_time": "2025-09-02T07:59:24",
"upload_time_iso_8601": "2025-09-02T07:59:24.358123Z",
"url": "https://files.pythonhosted.org/packages/47/ad/c37739fd8def95531dda51fab169eb5779db6b1ceb503e74781a57ab44b0/multiclean-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "468a441b83f9bd6de6d235c666bf1345b3f5f1ce54667b08e4a1f821db1a480b",
"md5": "44e638a6346e2724c71309b4a4855c52",
"sha256": "be900dc8791aa608a7ba9b5be5388659cecb4d3580056220cbb312980b40ae88"
},
"downloads": -1,
"filename": "multiclean-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "44e638a6346e2724c71309b4a4855c52",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 11835,
"upload_time": "2025-09-02T07:59:25",
"upload_time_iso_8601": "2025-09-02T07:59:25.609299Z",
"url": "https://files.pythonhosted.org/packages/46/8a/441b83f9bd6de6d235c666bf1345b3f5f1ce54667b08e4a1f821db1a480b/multiclean-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-02 07:59:25",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "DPIRD-DMA",
"github_project": "MultiClean",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "multiclean"
}