# undouble
[](https://img.shields.io/pypi/pyversions/undouble)
[](https://pypi.org/project/undouble/)
[](https://github.com/erdogant/undouble/blob/master/LICENSE)
[](https://github.com/erdogant/undouble/network)
[](https://github.com/erdogant/undouble/issues)
[](http://www.repostatus.org/#active)
[](https://erdogant.github.io/undouble/)
[](https://pepy.tech/project/undouble)
[](https://pepy.tech/project/undouble)
[](https://erdogant.github.io/undouble/)
<!---[](https://erdogant.github.io/pca/pages/html/Documentation.html#colab-notebook)-->
<!---[](https://www.buymeacoffee.com/erdogant)-->
<!---[](https://erdogant.github.io/donate/?currency=USD&amount=5)-->
The aim of ``undouble`` is to detect (near-)identical images. It works using a multi-step process of pre-processing the images (grayscaling, normalizing, and scaling), computing the image hash, and the grouping of images. A threshold of 0 will group images with an identical image hash. The results can easily be explored by the plotting
functionality and images can be moved with the move functionality. When moving images, the image in the group with the largest resolution will be copied, and all other images are moved to the **undouble** subdirectory. In case you want to cluster your images, I would recommend reading the [blog](https://towardsdatascience.com/a-step-by-step-guide-for-clustering-images-4b45f9906128) and use the [clustimage library](https://erdogant.github.io/clustimage).
The following steps are taken in the ``undouble`` library:
* Read recursively all images from directory with the specified extensions.
* Compute image hash.
* Group similar images.
* Move if desired.
#
**⭐️ Star this repo if you like it ⭐️**
#
### Blogs
* Read the blog to get a structured overview how to [detect duplicate images using image hash functions.](https://erdogant.medium.com/detection-of-duplicate-images-using-image-hash-functions-4d9c53f04a75")
#
### [Documentation pages](https://erdogant.github.io/undouble/)
On the [documentation pages](https://erdogant.github.io/undouble/) you can find detailed information about the working of the ``undouble`` with many examples.
#
### Installation
##### It is advisable to create a new environment (e.g. with Conda).
```bash
conda create -n env_undouble python=3.8
conda activate env_undouble
```
##### Install bnlearn from PyPI
```bash
pip install undouble # new install
pip install -U undouble # update to latest version
```
##### Directly install from github source
```bash
pip install git+https://github.com/erdogant/undouble
```
##### Import Undouble package
```python
from undouble import Undouble
```
<hr>
### Examples:
##### [Example: Grouping similar images of the flower dataset](https://erdogant.github.io/undouble/pages/html/Examples.html#)
<p align="left">
<a href="https://erdogant.github.io/undouble/pages/html/Examples.html#">
<img src="https://github.com/erdogant/undouble/blob/main/docs/figs/flowers1.png" width="400" />
</a>
</p>
<p align="left">
<a href="https://erdogant.github.io/undouble/pages/html/Examples.html#">
<img src="https://github.com/erdogant/undouble/blob/main/docs/figs/flowers2.png" width="400" />
</a>
</p>
<p align="left">
<a href="https://erdogant.github.io/undouble/pages/html/Examples.html#">
<img src="https://github.com/erdogant/undouble/blob/main/docs/figs/flowers3.png" width="400" />
</a>
</p>
#
##### [Example: List all file names that are identifical](https://erdogant.github.io/undouble/pages/html/Examples.html#get-identical-images)
#
##### [Example: Moving similar images in the flower dataset](https://erdogant.github.io/undouble/pages/html/Examples.html#move-files)
```python
# -------------------------------------------------
# >You are at the point of physically moving files.
# -------------------------------------------------
# >[7] similar images are detected over [3] groups.
# >[4] images will be moved to the [undouble] subdirectory.
# >[3] images will be copied to the [undouble] subdirectory.
# >[C]ontinue moving all files.
# >[W]ait in each directory.
# >[Q]uit
# >Answer: w
```
#
##### [Example: Plot the image hashes](https://erdogant.github.io/undouble/pages/html/Examples.html#plot-image-hash)
<p align="left">
<a href="https://erdogant.github.io/undouble/pages/html/Examples.html#plot-image-hash">
<img src="https://github.com/erdogant/undouble/blob/main/docs/figs/imghash_example.png" width="400" />
</a>
</p>
#
##### [Example: Three different imports](https://erdogant.github.io/undouble/pages/html/core_functions.html#input-data)
The input can be the following three types:
* Path to directory
* List of file locations
* Numpy array containing images
#
##### [Example: Finding identical mnist digits](https://erdogant.github.io/undouble/pages/html/Examples.html#mnist-dataset)
<p align="left">
<a href="https://erdogant.github.io/undouble/pages/html/Examples.html#mnist-dataset">
<img src="https://github.com/erdogant/undouble/blob/main/docs/figs/mnist_1.png" width="400" />
</a>
</p>
<hr>
#### Citation
Please cite in your publications if this is useful for your research (see citation).
### Maintainers
* Erdogan Taskesen, github: [erdogant](https://github.com/erdogant)
### Contribute
* All kinds of contributions are welcome!
* If you wish to buy me a <a href="https://www.buymeacoffee.com/erdogant">Coffee</a> for this work, it is very appreciated :)
### Licence
See [LICENSE](LICENSE) for details.
### Other interesting stuf
* https://github.com/JohannesBuchner/imagehash
* https://towardsdatascience.com/a-step-by-step-guide-for-clustering-images-4b45f9906128
Raw data
{
"_id": null,
"home_page": "https://erdogant.github.io/undouble",
"name": "undouble",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3",
"maintainer_email": null,
"keywords": null,
"author": "Erdogan Taskesen",
"author_email": "erdogant@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/93/76/f03a06b37d568f6675bc13a8a2d9e860e7fc1c954aeca5bd37e7351c7eab/undouble-1.4.6.tar.gz",
"platform": null,
"description": "# undouble\r\n\r\n[](https://img.shields.io/pypi/pyversions/undouble)\r\n[](https://pypi.org/project/undouble/)\r\n[](https://github.com/erdogant/undouble/blob/master/LICENSE)\r\n[](https://github.com/erdogant/undouble/network)\r\n[](https://github.com/erdogant/undouble/issues)\r\n[](http://www.repostatus.org/#active)\r\n[](https://erdogant.github.io/undouble/)\r\n[](https://pepy.tech/project/undouble)\r\n[](https://pepy.tech/project/undouble)\r\n[](https://erdogant.github.io/undouble/)\r\n<!---[](https://erdogant.github.io/pca/pages/html/Documentation.html#colab-notebook)-->\r\n<!---[](https://www.buymeacoffee.com/erdogant)-->\r\n<!---[](https://erdogant.github.io/donate/?currency=USD&amount=5)-->\r\n\r\nThe aim of ``undouble`` is to detect (near-)identical images. It works using a multi-step process of pre-processing the images (grayscaling, normalizing, and scaling), computing the image hash, and the grouping of images. A threshold of 0 will group images with an identical image hash. The results can easily be explored by the plotting\r\nfunctionality and images can be moved with the move functionality. When moving images, the image in the group with the largest resolution will be copied, and all other images are moved to the **undouble** subdirectory. In case you want to cluster your images, I would recommend reading the [blog](https://towardsdatascience.com/a-step-by-step-guide-for-clustering-images-4b45f9906128) and use the [clustimage library](https://erdogant.github.io/clustimage).\r\n\r\nThe following steps are taken in the ``undouble`` library:\r\n * Read recursively all images from directory with the specified extensions.\r\n * Compute image hash.\r\n * Group similar images.\r\n * Move if desired.\r\n\r\n\r\n# \r\n**\u2b50\ufe0f Star this repo if you like it \u2b50\ufe0f**\r\n#\r\n\r\n### Blogs\r\n\r\n* Read the blog to get a structured overview how to [detect duplicate images using image hash functions.](https://erdogant.medium.com/detection-of-duplicate-images-using-image-hash-functions-4d9c53f04a75\")\r\n\r\n# \r\n\r\n### [Documentation pages](https://erdogant.github.io/undouble/)\r\n\r\nOn the [documentation pages](https://erdogant.github.io/undouble/) you can find detailed information about the working of the ``undouble`` with many examples. \r\n\r\n# \r\n\r\n\r\n### Installation\r\n\r\n##### It is advisable to create a new environment (e.g. with Conda). \r\n```bash\r\nconda create -n env_undouble python=3.8\r\nconda activate env_undouble\r\n```\r\n\r\n##### Install bnlearn from PyPI\r\n```bash\r\npip install undouble # new install\r\npip install -U undouble # update to latest version\r\n```\r\n\r\n##### Directly install from github source\r\n```bash\r\npip install git+https://github.com/erdogant/undouble\r\n``` \r\n\r\n##### Import Undouble package\r\n\r\n```python\r\nfrom undouble import Undouble\r\n```\r\n\r\n<hr>\r\n\r\n### Examples:\r\n\r\n##### [Example: Grouping similar images of the flower dataset](https://erdogant.github.io/undouble/pages/html/Examples.html#)\r\n\r\n<p align=\"left\">\r\n <a href=\"https://erdogant.github.io/undouble/pages/html/Examples.html#\">\r\n <img src=\"https://github.com/erdogant/undouble/blob/main/docs/figs/flowers1.png\" width=\"400\" />\r\n </a>\r\n</p>\r\n\r\n<p align=\"left\">\r\n <a href=\"https://erdogant.github.io/undouble/pages/html/Examples.html#\">\r\n <img src=\"https://github.com/erdogant/undouble/blob/main/docs/figs/flowers2.png\" width=\"400\" />\r\n </a>\r\n</p>\r\n\r\n<p align=\"left\">\r\n <a href=\"https://erdogant.github.io/undouble/pages/html/Examples.html#\">\r\n <img src=\"https://github.com/erdogant/undouble/blob/main/docs/figs/flowers3.png\" width=\"400\" />\r\n </a>\r\n</p>\r\n\r\n\r\n# \r\n\r\n##### [Example: List all file names that are identifical](https://erdogant.github.io/undouble/pages/html/Examples.html#get-identical-images)\r\n\r\n# \r\n\r\n\r\n##### [Example: Moving similar images in the flower dataset](https://erdogant.github.io/undouble/pages/html/Examples.html#move-files)\r\n\r\n```python\r\n# -------------------------------------------------\r\n# >You are at the point of physically moving files.\r\n# -------------------------------------------------\r\n# >[7] similar images are detected over [3] groups.\r\n# >[4] images will be moved to the [undouble] subdirectory.\r\n# >[3] images will be copied to the [undouble] subdirectory.\r\n\r\n# >[C]ontinue moving all files.\r\n# >[W]ait in each directory.\r\n# >[Q]uit\r\n# >Answer: w\r\n\r\n```\r\n\r\n# \r\n\r\n##### [Example: Plot the image hashes](https://erdogant.github.io/undouble/pages/html/Examples.html#plot-image-hash)\r\n\r\n\r\n<p align=\"left\">\r\n <a href=\"https://erdogant.github.io/undouble/pages/html/Examples.html#plot-image-hash\">\r\n <img src=\"https://github.com/erdogant/undouble/blob/main/docs/figs/imghash_example.png\" width=\"400\" />\r\n </a>\r\n</p>\r\n\r\n# \r\n\r\n##### [Example: Three different imports](https://erdogant.github.io/undouble/pages/html/core_functions.html#input-data)\r\n\r\nThe input can be the following three types:\r\n\r\n * Path to directory\r\n * List of file locations\r\n * Numpy array containing images\r\n\r\n#\r\n\r\n\r\n##### [Example: Finding identical mnist digits](https://erdogant.github.io/undouble/pages/html/Examples.html#mnist-dataset)\r\n\r\n\r\n<p align=\"left\">\r\n <a href=\"https://erdogant.github.io/undouble/pages/html/Examples.html#mnist-dataset\">\r\n <img src=\"https://github.com/erdogant/undouble/blob/main/docs/figs/mnist_1.png\" width=\"400\" />\r\n </a>\r\n</p>\r\n\r\n<hr>\r\n\r\n#### Citation\r\nPlease cite in your publications if this is useful for your research (see citation).\r\n \r\n### Maintainers\r\n* Erdogan Taskesen, github: [erdogant](https://github.com/erdogant)\r\n\r\n### Contribute\r\n* All kinds of contributions are welcome!\r\n* If you wish to buy me a <a href=\"https://www.buymeacoffee.com/erdogant\">Coffee</a> for this work, it is very appreciated :)\r\n\r\n### Licence\r\nSee [LICENSE](LICENSE) for details.\r\n\r\n### Other interesting stuf\r\n* https://github.com/JohannesBuchner/imagehash\r\n* https://towardsdatascience.com/a-step-by-step-guide-for-clustering-images-4b45f9906128\r\n",
"bugtrack_url": null,
"license": null,
"summary": "Python package undouble",
"version": "1.4.6",
"project_urls": {
"Download": "https://github.com/erdogant/undouble/archive/1.4.6.tar.gz",
"Homepage": "https://erdogant.github.io/undouble"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "365cea84d5bb3c2ef57d9a17f127767f442f271105fb45b1c909d5416efc0b24",
"md5": "faea6a8b9b0c93c1caf106f821ef440e",
"sha256": "2081a35cd248e12b18a1c75ad725d64f766574be0d9cd8c998ec20af17c29555"
},
"downloads": -1,
"filename": "undouble-1.4.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "faea6a8b9b0c93c1caf106f821ef440e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3",
"size": 17751,
"upload_time": "2025-02-04T10:39:27",
"upload_time_iso_8601": "2025-02-04T10:39:27.966431Z",
"url": "https://files.pythonhosted.org/packages/36/5c/ea84d5bb3c2ef57d9a17f127767f442f271105fb45b1c909d5416efc0b24/undouble-1.4.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "9376f03a06b37d568f6675bc13a8a2d9e860e7fc1c954aeca5bd37e7351c7eab",
"md5": "fbbfb803435a8ecb3659a29d442b7999",
"sha256": "19b45de37d6ae7bfcd3172ef30c041f549f48a2612cd71b3d3f74c42de6bb812"
},
"downloads": -1,
"filename": "undouble-1.4.6.tar.gz",
"has_sig": false,
"md5_digest": "fbbfb803435a8ecb3659a29d442b7999",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3",
"size": 18669,
"upload_time": "2025-02-04T10:39:29",
"upload_time_iso_8601": "2025-02-04T10:39:29.077459Z",
"url": "https://files.pythonhosted.org/packages/93/76/f03a06b37d568f6675bc13a8a2d9e860e7fc1c954aeca5bd37e7351c7eab/undouble-1.4.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-04 10:39:29",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "erdogant",
"github_project": "undouble",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "matplotlib",
"specs": []
},
{
"name": "numpy",
"specs": []
},
{
"name": "tqdm",
"specs": []
},
{
"name": "clustimage",
"specs": [
[
">=",
"1.6.6"
]
]
},
{
"name": "ismember",
"specs": []
},
{
"name": "datazets",
"specs": [
[
">=",
"1.0.0"
]
]
}
],
"lcname": "undouble"
}