<div align="center">
# HistoSlice
[](https://pypi.org/project/histoslice/)
[](https://pypi.org/project/histoslice/)
[](./LICENSE)
[](https://github.com/rmuraix/HistoSlice/actions/workflows/check.yaml)
[](https://github.com/rmuraix/HistoSlice/actions/workflows/docs.yaml)
[](https://codecov.io/github/rmuraix/HistoSlice)
Preprocessing large medical images for machine learning made easy!
<p align="center">
<a href="https://lab.rmurai.com/HistoSlice/">Documentation</a> •
<a href="https://pypi.org/project/histoslice/">PyPI</a>
</p>
</div>
## Description
`HistoSlice` makes is easy to prepare your histological slide images for deep
learning models. You can easily cut large slide images into smaller tiles and then
preprocess those tiles (remove tiles with shitty tissue, finger marks etc).
> [!NOTE]
> This project was forked from [HistoPrep](https://github.com/jopo666/HistoPrep), and further modified for additional features and improvements.
## Installation
```bash
uv add histoslice
# or
pip install histoslice
```
## Usage
Typical workflow for training deep learning models with histological images is the
following:
1. Cut each slide image into smaller tile images.
2. Preprocess smaller tile images by removing tiles with bad tissue, staining artifacts.
```bash
HistoSlice --input './train_images/*.tiff' --output ./tiles --width 512 --overlap 0.5 --max-background 0.5 --metrics --thumbnail
```
Or you can use the `HistoSlice` python API to do the same thing!
```python
from histoslice import SlideReader
# Read slide image.
reader = SlideReader("./slides/slide_with_ink.jpeg")
# Detect tissue.
threshold, tissue_mask = reader.get_tissue_mask(level=-1)
# Extract overlapping tile coordinates with less than 50% background.
tile_coordinates = reader.get_tile_coordinates(
tissue_mask, width=512, overlap=0.5, max_background=0.5
)
# Save tile images with image metrics for preprocessing.
tile_metadata = reader.save_regions(
"./train_tiles/",
tile_coordinates,
threshold=threshold,
save_metrics=True,
save_thumbnail=True
)
```
Let's take a look at the output and visualise the thumbnails.
```bash
train_tiles
└── slide_with_ink
├── metadata.parquet # tile metadata
├── properties.json # tile properties
├── thumbnail.jpeg # thumbnail image
├── thumbnail_tiles.jpeg # thumbnail with tiles
├── thumbnail_tissue.jpeg # thumbnail of the tissue mask
└── tiles [390 entries exceeds filelimit, not opening dir]
```



As we can see from the above images, histological slide images often contain areas that
we would not like to include into our training data. Might seem like a daunting task but
let's try it out!
```python
from histoslice.utils import OutlierDetector
# Let's wrap the tile metadata with a helper class.
detector = OutlierDetector(tile_metadata)
# Cluster tiles based on image metrics.
clusters = detector.cluster_kmeans(num_clusters=4, random_state=666)
# Visualise first cluster.
reader.get_annotated_thumbnail(
image=reader.read_level(-1), coordinates=detector.coordinates[clusters == 0]
)
```

Now we can mark tiles in cluster `0` as outliers!
Raw data
{
"_id": null,
"home_page": null,
"name": "histoslice",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "WSI, Whole Slide Imaging, histology, image processing, slide",
"author": null,
"author_email": "Ryota Murai <opensource@rmurai.com>",
"download_url": "https://files.pythonhosted.org/packages/e3/2a/9f95ba57bf9aac6335b035e68b7ce1c628c1e14ed4ce71660033f2886c19/histoslice-0.1.0.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n\n# HistoSlice\n\n[](https://pypi.org/project/histoslice/)\n[](https://pypi.org/project/histoslice/)\n[](./LICENSE)\n[](https://github.com/rmuraix/HistoSlice/actions/workflows/check.yaml)\n[](https://github.com/rmuraix/HistoSlice/actions/workflows/docs.yaml)\n[](https://codecov.io/github/rmuraix/HistoSlice)\n\nPreprocessing large medical images for machine learning made easy!\n\n<p align=\"center\">\n <a href=\"https://lab.rmurai.com/HistoSlice/\">Documentation</a> \u2022\n <a href=\"https://pypi.org/project/histoslice/\">PyPI</a>\n</p>\n\n</div>\n\n## Description\n\n`HistoSlice` makes is easy to prepare your histological slide images for deep\nlearning models. You can easily cut large slide images into smaller tiles and then\npreprocess those tiles (remove tiles with shitty tissue, finger marks etc).\n\n> [!NOTE]\n> This project was forked from [HistoPrep](https://github.com/jopo666/HistoPrep), and further modified for additional features and improvements.\n\n## Installation\n\n```bash\nuv add histoslice\n# or\npip install histoslice\n```\n\n## Usage\n\nTypical workflow for training deep learning models with histological images is the\nfollowing:\n\n1. Cut each slide image into smaller tile images.\n2. Preprocess smaller tile images by removing tiles with bad tissue, staining artifacts.\n\n```bash\nHistoSlice --input './train_images/*.tiff' --output ./tiles --width 512 --overlap 0.5 --max-background 0.5 --metrics --thumbnail\n```\n\nOr you can use the `HistoSlice` python API to do the same thing!\n\n```python\nfrom histoslice import SlideReader\n\n# Read slide image.\nreader = SlideReader(\"./slides/slide_with_ink.jpeg\")\n# Detect tissue.\nthreshold, tissue_mask = reader.get_tissue_mask(level=-1)\n# Extract overlapping tile coordinates with less than 50% background.\ntile_coordinates = reader.get_tile_coordinates(\n tissue_mask, width=512, overlap=0.5, max_background=0.5\n)\n# Save tile images with image metrics for preprocessing.\ntile_metadata = reader.save_regions(\n \"./train_tiles/\",\n tile_coordinates,\n threshold=threshold,\n save_metrics=True,\n save_thumbnail=True\n)\n```\n\nLet's take a look at the output and visualise the thumbnails.\n\n```bash\ntrain_tiles\n\u2514\u2500\u2500 slide_with_ink\n \u251c\u2500\u2500 metadata.parquet # tile metadata\n \u251c\u2500\u2500 properties.json # tile properties\n \u251c\u2500\u2500 thumbnail.jpeg # thumbnail image\n \u251c\u2500\u2500 thumbnail_tiles.jpeg # thumbnail with tiles\n \u251c\u2500\u2500 thumbnail_tissue.jpeg # thumbnail of the tissue mask\n \u2514\u2500\u2500 tiles [390 entries exceeds filelimit, not opening dir]\n```\n\n\n\n\n\nAs we can see from the above images, histological slide images often contain areas that\nwe would not like to include into our training data. Might seem like a daunting task but\nlet's try it out!\n\n```python\nfrom histoslice.utils import OutlierDetector\n\n# Let's wrap the tile metadata with a helper class.\ndetector = OutlierDetector(tile_metadata)\n# Cluster tiles based on image metrics.\nclusters = detector.cluster_kmeans(num_clusters=4, random_state=666)\n# Visualise first cluster.\nreader.get_annotated_thumbnail(\n image=reader.read_level(-1), coordinates=detector.coordinates[clusters == 0]\n)\n```\n\n\n\nNow we can mark tiles in cluster `0` as outliers!\n",
"bugtrack_url": null,
"license": null,
"summary": "Read and process histological slide images with python!",
"version": "0.1.0",
"project_urls": {
"Changelog": "https://github.com/rmuraix/HistoSlice/releases",
"Documentation": "https://lab.rmurai.com/HistoSlice/",
"Homepage": "https://github.com/rmuraix/HistoSlice",
"Issues": "https://github.com/rmuraix/HistoSlice/issues",
"Repository": "https://github.com/rmuraix/HistoSlice.git"
},
"split_keywords": [
"wsi",
" whole slide imaging",
" histology",
" image processing",
" slide"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "56428b8a3fbf56d101d30ee6fecab6ef7a67b8d4ad58ed2c231d4d658c045472",
"md5": "002fd08ad710c29f53d4fd4378c34f4c",
"sha256": "088798b63cefb35b1c481de62c1022a451d7c6b6b5fb68e94be710bbcd958d81"
},
"downloads": -1,
"filename": "histoslice-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "002fd08ad710c29f53d4fd4378c34f4c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 45077,
"upload_time": "2025-08-20T09:13:52",
"upload_time_iso_8601": "2025-08-20T09:13:52.706796Z",
"url": "https://files.pythonhosted.org/packages/56/42/8b8a3fbf56d101d30ee6fecab6ef7a67b8d4ad58ed2c231d4d658c045472/histoslice-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "e32a9f95ba57bf9aac6335b035e68b7ce1c628c1e14ed4ce71660033f2886c19",
"md5": "cbdeb04778d8448b8435dfb0b49ed389",
"sha256": "588c6a18ab330be52188ae9d1b435f28aebde62b8839d48217738aa626c4c9f6"
},
"downloads": -1,
"filename": "histoslice-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "cbdeb04778d8448b8435dfb0b49ed389",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 4414883,
"upload_time": "2025-08-20T09:13:54",
"upload_time_iso_8601": "2025-08-20T09:13:54.460269Z",
"url": "https://files.pythonhosted.org/packages/e3/2a/9f95ba57bf9aac6335b035e68b7ce1c628c1e14ed4ce71660033f2886c19/histoslice-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-20 09:13:54",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "rmuraix",
"github_project": "HistoSlice",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "histoslice"
}