Name | sdcat JSON |
Version |
1.24.0
JSON |
| download |
home_page | None |
Summary | Sliced Detection and Clustering Analysis Toolkit - Developed by MBARI |
upload_time | 2025-05-24 23:51:53 |
maintainer | None |
docs_url | None |
author | Danelle Cline |
requires_python | <3.12,>=3.9 |
license | Apache |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
[](http://www.mbari.org)
[](https://github.com/semantic-release/semantic-release)
[](https://opensource.org/licenses/Apache-2.0)
[](https://www.python.org/downloads/)
[](https://github.com/mbari-org/sdcat/actions/workflows/pytest.yml)
**sdcat**
*Sliced Detection and Clustering Analysis Toolkit*
This repository processes images using a sliced detection and clustering workflow.
If your images look something like the image below, and you want to detect objects in the images,
and optionally cluster the detections, then this repository may be useful to you.
The repository is designed to be run from the command line, and can be run in a Docker container,
without or with a GPU (recommended).
To use with a multiple gpus, use the --device cuda option
To use with single gpus, use the --device cuda:0,1 option
---

---
Detection
---
Detection can be done with a fine-grained saliency-based detection model, and/or one the following models run with the SAHI algorithm.
Both detections algorithms (saliency and object dtection) are run by default and combined to produce the final detections.
SAHI is short for Slicing Aided Hyper Inference, and is a method to slice images into smaller windows and run a detection model
on the windows.
| Object Detection Model | Description |
|----------------------------------|--------------------------------------------------------------------|
| yolov8s | YOLOv8s model from Ultralytics |
| hustvl/yolos-small | YOLOS model a Vision Transformer (ViT) |
| hustvl/yolos-tiny | YOLOS model a Vision Transformer (ViT) |
| MBARI-org/megamidwater (default) | MBARI midwater YOLOv5x for general detection in midwater images |
| MBARI-org/uav-yolov5 | MBARI UAV YOLOv5x for general detection in UAV images |
| MBARI-org/yolov5x6-uavs-oneclass | MBARI UAV YOLOv5x for general detection in UAV images single class |
| FathomNet/MBARI-315k-yolov5 | MBARI YOLOv5x for general detection in benthic images |
To skip saliency detection, use the --skip-saliency option.
```shell
sdcat detect --skip-saliency --image-dir <image-dir> --save-dir <save-dir> --model <model> --slice-size-width 900 --slice-size-height 900
```
To skip using the SAHI algorithm, use --skip-sahi.
```shell
sdcat detect --skip-sahi --image-dir <image-dir> --save-dir <save-dir> --model <model> --slice-size-width 900 --slice-size-height 900
````
---
ViTS + HDBSCAN Clustering
---
Once the detections are generated, the detections can be clustered. Alternatively,
detections can be clustered from a collection of images, sometimes referred to as
region of interests (ROIs) by providing the detections in a folder with the roi option.
```shell
sdcat cluster roi --roi <roi> --save-dir <save-dir> --model <model>
```
The clustering is done with a Vision Transformer (ViT) model, and a cosine similarity metric with the HDBSCAN algorithm.
The ViT model is used to generate embeddings for the detections, and the HDBSCAN algorithm is used to cluster the detections.
What is an embedding? An embedding is a vector representation of an object in an image.
The defaults are set to produce fine-grained clusters, but the parameters can be adjusted to produce coarser clusters.
The algorithm workflow looks like this:

| Vision Transformer (ViT) Models | Description |
|--------------------------------------|--------------------------------------------------------------------------------|
| google/vit-base-patch16-224(default) | 16 block size trained on ImageNet21k with 21k classes |
| facebook/dino-vits8 | trained on ImageNet which contains 1.3 M images with labels from 1000 classes |
| facebook/dino-vits16 | trained on ImageNet which contains 1.3 M images with labels from 1000 classes |
| MBARI-org/mbari-uav-vit-b-16 | MBARI UAV vits16 model trained on 10425 UAV images with labels from 21 classes |
Smaller block_size means more patches and more accurate fine-grained clustering on smaller objects, so
ViTS models with 8 block size are recommended for fine-grained clustering on small objects, and 16 is recommended for coarser clustering on
larger objects. We recommend running with multiple models to see which model works best for your data,
and to experiment with the --min-samples and --min-cluster-size options to get good clustering results.
# Installation
Pip install the sdcat package with:
```bash
pip install sdcat
```
Alternatively, [Docker](https://www.docker.com) can be used to run the code. A pre-built docker image is available at [Docker Hub](https://hub.docker.com/r/mbari/sdcat) with the latest version of the code.
Detection
```shell
docker run -it -v $(pwd):/data mbari/sdcat detect --image-dir /data/images --save-dir /data/detections --model MBARI-org/uav-yolov5
```
Followed by clustering
```shell
docker run -it -v $(pwd):/data mbari/sdcat cluster detections --det-dir /data/detections/ --save-dir /data/detections --model MBARI-org/uav-yolov5
```
A GPU is recommended for clustering and detection. If you don't have a GPU, you can still run the code, but it will be slower.
If running on a CPU, multiple cores are recommended and will speed up processing.
```shell
docker run -it --gpus all -v $(pwd):/data mbari/sdcat:cuda124 detect --image-dir /data/images --save-dir /data/detections --model MBARI-org/uav-yolov5
```
# Commands
To get all options available, use the --help option. For example:
```shell
sdcat --help
```
which will print out the following:
```shell
Usage: sdcat [OPTIONS] COMMAND [ARGS]...
Process images from a command line.
Options:
-V, --version Show the version and exit.
-h, --help Show this message and exit.
Commands:
cluster Cluster detections.
detect Detect objects in images
```
To get details on a particular command, use the --help option with the command. For example, with the **cluster** command:
```shell
sdcat cluster --help
```
which will print out the following:
```shell
Usage: sdcat cluster [OPTIONS] COMMAND [ARGS]...
Commands related to clustering images
Options:
-h, --help Show this message and exit.
Commands:
detections Cluster detections.
roi Cluster roi.
```
## File organization
The sdcat toolkit generates data in the following folders.
For detections, the output is organized in a folder with the following structure:
```
/data/20230504-MBARI/
└── detections
└── hustvl
└── yolos-small # The model used to generate the detections
├── det_raw # The raw detections from the model
│ └── csv
│ ├── DSC01833.csv
│ ├── DSC01859.csv
│ ├── DSC01861.csv
│ └── DSC01922.csv
├── det_filtered # The filtered detections from the model
├── crops # Crops of the detections
├── dino_vits8...date # The clustering results - one folder per each run of the clustering algorithm
├── dino_vits8..detections.csv # The detections with the cluster id
├── stats.txt # Statistics of the detections
└── vizresults # Visualizations of the detections (boxes overlaid on images)
├── DSC01833.jpg
├── DSC01859.jpg
├── DSC01861.jpg
└── DSC01922.jpg
```
For clustering, the output is organized in a folder with the following structure:
```
/data/20230504-MBARI/
└── clusters
└── crops # The detection crops/rois, embeddings and predictions
└── dino_vit8..._cluster_detections.parquet # The detections with the cluster id and predictions in parquet format
└── dino_vit8..._cluster_detections.csv # The detections with the cluster id and predictions
└── dino_vit8..._cluster_config.ini # Copy of the config file used to run the clustering
└── dino_vit8..._cluster_summary.json # Summary of the clustering results
└── dino_vit8..._cluster_summary.png # 2D plot of the clustering results
└── dino_vit8...
├── dino_vits8.._cluster_1_p0.png # Cluster 1 page 1 grid plot
├── dino_vits8.._cluster_1_p1.png # Cluster 1 page 2 grid plot
├── dino_vits8.._cluster_2_p0.png # Cluster 2 page 0 grid plot
├── dino_vits8.._cluster_noise_p0.png # Noise (unclustered) page 0 grid plot
```
Example grid plot of the clustering results:

## Process images creating bounding box detections with the YOLOv8s model.
The YOLOv8s model is not as accurate as other models, but is fast and good for detecting larger objects in images,
and good for experiments and quick results.
**Slice size** is the size of the detection window. The default is to allow the SAHI algorithm to determine the slice size;
a smaller slice size will take longer to process.
```shell
sdcat detect --image-dir <image-dir> --save-dir <save-dir> --model yolov8s --slice-size-width 900 --slice-size-height 900
```
## Cluster detections from the YOLOv8s model, but use the classifications from the ViT model.
Cluster the detections from the YOLOv8s model. The detections are clustered using cosine similarity and embedding
features from the default Vision Transformer (ViT) model `google/vit-base-patch16-224`
```shell
sdcat cluster --det-dir <det-dir>/yolov8s/det_filtered --save-dir <save-dir> --use-vits
```
# Related work
* https://github.com/obss/sahi SAHI
* https://arxiv.org/abs/2010.11929 An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
* https://github.com/facebookresearch/dinov2 DINOv2
* https://arxiv.org/pdf/1911.02282.pdf HDBSCAN
* https://github.com/muratkrty/specularity-removal Specularity Removal
Raw data
{
"_id": null,
"home_page": null,
"name": "sdcat",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.12,>=3.9",
"maintainer_email": null,
"keywords": null,
"author": "Danelle Cline",
"author_email": "dcline@mbari.org",
"download_url": "https://files.pythonhosted.org/packages/a2/0a/610d1bd1d5379a9f532932eb0f356f29f612106ae68728a8e3455491f6a2/sdcat-1.24.0.tar.gz",
"platform": null,
"description": "[](http://www.mbari.org)\n[](https://github.com/semantic-release/semantic-release)\n[](https://opensource.org/licenses/Apache-2.0)\n[](https://www.python.org/downloads/)\n[](https://github.com/mbari-org/sdcat/actions/workflows/pytest.yml)\n\n**sdcat** \n\n*Sliced Detection and Clustering Analysis Toolkit*\n\nThis repository processes images using a sliced detection and clustering workflow.\nIf your images look something like the image below, and you want to detect objects in the images, \nand optionally cluster the detections, then this repository may be useful to you.\nThe repository is designed to be run from the command line, and can be run in a Docker container,\nwithout or with a GPU (recommended). \n\nTo use with a multiple gpus, use the --device cuda option \nTo use with single gpus, use the --device cuda:0,1 option\n\n--- \n\n---\nDetection\n---\nDetection can be done with a fine-grained saliency-based detection model, and/or one the following models run with the SAHI algorithm.\nBoth detections algorithms (saliency and object dtection) are run by default and combined to produce the final detections.\nSAHI is short for Slicing Aided Hyper Inference, and is a method to slice images into smaller windows and run a detection model \non the windows.\n\n| Object Detection Model | Description |\n|----------------------------------|--------------------------------------------------------------------|\n| yolov8s | YOLOv8s model from Ultralytics |\n| hustvl/yolos-small | YOLOS model a Vision Transformer (ViT) |\n| hustvl/yolos-tiny | YOLOS model a Vision Transformer (ViT) |\n| MBARI-org/megamidwater (default) | MBARI midwater YOLOv5x for general detection in midwater images |\n| MBARI-org/uav-yolov5 | MBARI UAV YOLOv5x for general detection in UAV images |\n| MBARI-org/yolov5x6-uavs-oneclass | MBARI UAV YOLOv5x for general detection in UAV images single class |\n| FathomNet/MBARI-315k-yolov5 | MBARI YOLOv5x for general detection in benthic images |\n\n\nTo skip saliency detection, use the --skip-saliency option. \n\n```shell\nsdcat detect --skip-saliency --image-dir <image-dir> --save-dir <save-dir> --model <model> --slice-size-width 900 --slice-size-height 900\n```\n\nTo skip using the SAHI algorithm, use --skip-sahi. \n\n```shell\nsdcat detect --skip-sahi --image-dir <image-dir> --save-dir <save-dir> --model <model> --slice-size-width 900 --slice-size-height 900\n````\n\n---\nViTS + HDBSCAN Clustering\n---\nOnce the detections are generated, the detections can be clustered. Alternatively, \ndetections can be clustered from a collection of images, sometimes referred to as \nregion of interests (ROIs) by providing the detections in a folder with the roi option.\n \n```shell\nsdcat cluster roi --roi <roi> --save-dir <save-dir> --model <model> \n```\n\nThe clustering is done with a Vision Transformer (ViT) model, and a cosine similarity metric with the HDBSCAN algorithm.\nThe ViT model is used to generate embeddings for the detections, and the HDBSCAN algorithm is used to cluster the detections.\nWhat is an embedding? An embedding is a vector representation of an object in an image. \n\nThe defaults are set to produce fine-grained clusters, but the parameters can be adjusted to produce coarser clusters.\nThe algorithm workflow looks like this:\n\n\n\n| Vision Transformer (ViT) Models | Description |\n|--------------------------------------|--------------------------------------------------------------------------------|\n| google/vit-base-patch16-224(default) | 16 block size trained on ImageNet21k with 21k classes |\n| facebook/dino-vits8 | trained on ImageNet which contains 1.3 M images with labels from 1000 classes |\n| facebook/dino-vits16 | trained on ImageNet which contains 1.3 M images with labels from 1000 classes |\n| MBARI-org/mbari-uav-vit-b-16 | MBARI UAV vits16 model trained on 10425 UAV images with labels from 21 classes |\n\nSmaller block_size means more patches and more accurate fine-grained clustering on smaller objects, so\nViTS models with 8 block size are recommended for fine-grained clustering on small objects, and 16 is recommended for coarser clustering on \nlarger objects. We recommend running with multiple models to see which model works best for your data,\nand to experiment with the --min-samples and --min-cluster-size options to get good clustering results.\n \n# Installation\n \nPip install the sdcat package with:\n\n```bash\npip install sdcat\n```\n\nAlternatively, [Docker](https://www.docker.com) can be used to run the code. A pre-built docker image is available at [Docker Hub](https://hub.docker.com/r/mbari/sdcat) with the latest version of the code. \n \nDetection\n```shell\ndocker run -it -v $(pwd):/data mbari/sdcat detect --image-dir /data/images --save-dir /data/detections --model MBARI-org/uav-yolov5\n```\nFollowed by clustering\n```shell\ndocker run -it -v $(pwd):/data mbari/sdcat cluster detections --det-dir /data/detections/ --save-dir /data/detections --model MBARI-org/uav-yolov5\n```\n\nA GPU is recommended for clustering and detection. If you don't have a GPU, you can still run the code, but it will be slower.\nIf running on a CPU, multiple cores are recommended and will speed up processing.\n\n```shell\ndocker run -it --gpus all -v $(pwd):/data mbari/sdcat:cuda124 detect --image-dir /data/images --save-dir /data/detections --model MBARI-org/uav-yolov5\n```\n\n# Commands\n\nTo get all options available, use the --help option. For example:\n\n```shell\nsdcat --help\n```\nwhich will print out the following:\n```shell\nUsage: sdcat [OPTIONS] COMMAND [ARGS]...\n\n Process images from a command line.\n\nOptions:\n -V, --version Show the version and exit.\n -h, --help Show this message and exit.\n\nCommands:\n cluster Cluster detections.\n detect Detect objects in images\n\n```\n\nTo get details on a particular command, use the --help option with the command. For example, with the **cluster** command:\n\n```shell\n sdcat cluster --help \n```\n\nwhich will print out the following:\n```shell\nUsage: sdcat cluster [OPTIONS] COMMAND [ARGS]...\n\n Commands related to clustering images\n\nOptions:\n -h, --help Show this message and exit.\n\nCommands:\n detections Cluster detections.\n roi Cluster roi.\n```\n\n## File organization\n\nThe sdcat toolkit generates data in the following folders. \n\nFor detections, the output is organized in a folder with the following structure:\n \n```\n/data/20230504-MBARI/\n\u2514\u2500\u2500 detections\n \u2514\u2500\u2500 hustvl\n \u2514\u2500\u2500 yolos-small # The model used to generate the detections\n \u251c\u2500\u2500 det_raw # The raw detections from the model\n \u2502\u00a0\u00a0 \u2514\u2500\u2500 csv \n \u2502\u00a0\u00a0 \u251c\u2500\u2500 DSC01833.csv\n \u2502\u00a0\u00a0 \u251c\u2500\u2500 DSC01859.csv\n \u2502\u00a0\u00a0 \u251c\u2500\u2500 DSC01861.csv\n \u2502\u00a0\u00a0 \u2514\u2500\u2500 DSC01922.csv\n \u251c\u2500\u2500 det_filtered # The filtered detections from the model\n \u251c\u2500\u2500 crops # Crops of the detections \n \u251c\u2500\u2500 dino_vits8...date # The clustering results - one folder per each run of the clustering algorithm\n \u251c\u2500\u2500 dino_vits8..detections.csv # The detections with the cluster id\n \u251c\u2500\u2500 stats.txt # Statistics of the detections\n \u2514\u2500\u2500 vizresults # Visualizations of the detections (boxes overlaid on images)\n \u251c\u2500\u2500 DSC01833.jpg\n \u251c\u2500\u2500 DSC01859.jpg\n \u251c\u2500\u2500 DSC01861.jpg\n \u2514\u2500\u2500 DSC01922.jpg\n\n```\n\nFor clustering, the output is organized in a folder with the following structure:\n\n```\n/data/20230504-MBARI/\n\u2514\u2500\u2500 clusters\n \u2514\u2500\u2500 crops # The detection crops/rois, embeddings and predictions\n \u2514\u2500\u2500 dino_vit8..._cluster_detections.parquet # The detections with the cluster id and predictions in parquet format\n \u2514\u2500\u2500 dino_vit8..._cluster_detections.csv # The detections with the cluster id and predictions\n \u2514\u2500\u2500 dino_vit8..._cluster_config.ini # Copy of the config file used to run the clustering\n \u2514\u2500\u2500 dino_vit8..._cluster_summary.json # Summary of the clustering results\n \u2514\u2500\u2500 dino_vit8..._cluster_summary.png # 2D plot of the clustering results\n \u2514\u2500\u2500 dino_vit8... \n \u251c\u2500\u2500 dino_vits8.._cluster_1_p0.png # Cluster 1 page 1 grid plot\n \u251c\u2500\u2500 dino_vits8.._cluster_1_p1.png # Cluster 1 page 2 grid plot\n \u251c\u2500\u2500 dino_vits8.._cluster_2_p0.png # Cluster 2 page 0 grid plot\n \u251c\u2500\u2500 dino_vits8.._cluster_noise_p0.png # Noise (unclustered) page 0 grid plot\n```\n\nExample grid plot of the clustering results:\n\n \n## Process images creating bounding box detections with the YOLOv8s model.\nThe YOLOv8s model is not as accurate as other models, but is fast and good for detecting larger objects in images,\nand good for experiments and quick results. \n**Slice size** is the size of the detection window. The default is to allow the SAHI algorithm to determine the slice size;\na smaller slice size will take longer to process.\n\n```shell\nsdcat detect --image-dir <image-dir> --save-dir <save-dir> --model yolov8s --slice-size-width 900 --slice-size-height 900\n```\n\n## Cluster detections from the YOLOv8s model, but use the classifications from the ViT model.\n\nCluster the detections from the YOLOv8s model. The detections are clustered using cosine similarity and embedding\nfeatures from the default Vision Transformer (ViT) model `google/vit-base-patch16-224` \n\n```shell\nsdcat cluster --det-dir <det-dir>/yolov8s/det_filtered --save-dir <save-dir> --use-vits\n```\n \n\n# Related work\n* https://github.com/obss/sahi SAHI\n* https://arxiv.org/abs/2010.11929 An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale\n* https://github.com/facebookresearch/dinov2 DINOv2\n* https://arxiv.org/pdf/1911.02282.pdf HDBSCAN\n* https://github.com/muratkrty/specularity-removal Specularity Removal\n",
"bugtrack_url": null,
"license": "Apache",
"summary": "Sliced Detection and Clustering Analysis Toolkit - Developed by MBARI",
"version": "1.24.0",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "36c64b5c4d9bad07f185ce4c4e483816b5ee5699bc74cb40b52cbb8619204904",
"md5": "58f174833f19e66a70e08362e0fe085f",
"sha256": "8ea8f0659d0fe790337b2fa3dc89bd5402be9c6237d233aebc2aeee6803e7245"
},
"downloads": -1,
"filename": "sdcat-1.24.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "58f174833f19e66a70e08362e0fe085f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.12,>=3.9",
"size": 51292,
"upload_time": "2025-05-24T23:51:52",
"upload_time_iso_8601": "2025-05-24T23:51:52.070195Z",
"url": "https://files.pythonhosted.org/packages/36/c6/4b5c4d9bad07f185ce4c4e483816b5ee5699bc74cb40b52cbb8619204904/sdcat-1.24.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a20a610d1bd1d5379a9f532932eb0f356f29f612106ae68728a8e3455491f6a2",
"md5": "88be864074e4579abe34aea6c1a72ab8",
"sha256": "e6e70e5628303bab719d3d75aaf46d0cb90c48bf56e46c2571883ed011ba1834"
},
"downloads": -1,
"filename": "sdcat-1.24.0.tar.gz",
"has_sig": false,
"md5_digest": "88be864074e4579abe34aea6c1a72ab8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.12,>=3.9",
"size": 44209,
"upload_time": "2025-05-24T23:51:53",
"upload_time_iso_8601": "2025-05-24T23:51:53.665022Z",
"url": "https://files.pythonhosted.org/packages/a2/0a/610d1bd1d5379a9f532932eb0f356f29f612106ae68728a8e3455491f6a2/sdcat-1.24.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-05-24 23:51:53",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "sdcat"
}